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ON OPTIMUM TESTS OF COMPOSITE HYPOTHESES WITH ONE 
CONSTRAINT? 


By E. L. LEHMANN 
University of California, Berkeley 


Summary. This paper is concerned with optimum tests of certain composite 
hypotheses. In section 2 various aspects of a theorem of Scheffé concerning type 
B, tests are discussed. It is pointed out that the theorem can be extended to 
cover uniformly most powerful tests against a one-sided set of alternatives. 
It is also shown that the method for determining explicitly the optimum test 
region may in certain cases be reduced to a simple formal procedure. These 
results are used in section 3 to obtain optimum tests for the composite hypothesis 
specifying the value of the circular serial correlation coefficient in a normal 
distribution. A surprising feature of this example is the fact that for the simple 
hypothesis obtained by specifying values for the nuisance parameters no test 
with the corresponding optimum properties exists. 

In section 4 the totality of similar regions is obtained for a large class of prob- 
ability laws which admit a sufficient statistic. Some composite hypotheses 
concerning exponential and rectangular distributions are treated in section 5. 
It is proved that the likelihood ratio tests of these hypotheses have various op- 
timum properties. 


1. Introduction. In developing tests for a class of hypotheses three phases 
may be distinguished. First, tests are obtained which are intuitively appealing; 
next, it is shown that these tests have certain attractive features; finally, it is 
proved that they are “‘best possible” tests. 

In dealing with parametric hypotheses, the likelihood ratio principle is fre- 
quently used to obtain a reasonable test. For many of the tests so derived for 
normal and exponential distributions, the question of bias has been investigated. 
In most cases unbiasedness has been established; in the other cases, usually a test 
based on the same criterion but with the boundaries shifted, can be proved to be 
unbiased. Other desirable properties which likelihood ratio tests have been 
shown to possess, relate to the asymptotic behaviour of these tests as the sample 
sizes tend to infinity. An interesting problem which does not seem to have been 
treated is the question of admissibility of likelihood ratio tests, a test being ad- 
missible if its power can not be improved upon uniformly by any other test of 
the same level of significance. 

Investigations of optimum tests of composite hypotheses have been carried 
through for many hypotheses concerning normal distributions. When the hy- 
pothesis specifies the value of one parameter (hypothesis with one constraint), 
uniformly most powerful one-sided and type BA; (uniformly most powerful un- 


1 Presented at a meeting of the Institute of Mathematical Statistics in San Diego, June, 
1947. 
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biased) tests have been obtained. When the number of constraints is larger 
than one, not so much can be expected. It has been shown for some of the tests 
in this class that they have maximum average power uniformly over a family of 
surfaces in the parameter space, or that they are uniformly most powerful with 
respect to the subclass of tests whose power depends only on some function of 
the parameters. (All optimum properties mentioned are relative only to the 
class of all similar regions. This will be so throughout the paper and will usually 
not be stated explicitly). 

Two methods for finding uniformly most powerful or uniformly most powerful 
one-sided regions and type B; tests, if they exist are known. Neyman and Pear- 
son [1] developed a method for determining all similar regions, and applied it 
to obtain uniformly most powerful one-sided tests of certain hypotheses. Ney- 
man [2, 3] extended the method to obtain, for certain hypotheses, the class of all 
bisimilar (unbiased similar) regions, and Scheffé [4], developing the method 
further, proved the existence of type B, tests for an important class of hypotheses. 

A different method for obtaining all similar and bisimilar regions was devised 
by P. L. Hsu and was used by him and other writers to prove various optimum 
properties of the likelihood ratio tests for the general linear hypothesis, of Hotel- 
ling’s T’ and of other tests [5, 6, 7, 8]. 

In the present paper we are concerned with applications of these two methods 
to composite hypotheses with one constraint. However, the applicability is not 
so restricted. In fact, the second method has been used mainly in connection 
with composite hypotheses with many constraints, and the author believes it to 
be suitable also for deriving optimum classification procedures. An essential 
restriction of both methods seems to be that a set of sufficient statistics must exist 
with respect to the parameters involved: with respect to the nuisance parameters 
so that all similar regions can be found, with respect to the parameters specified 
by the hypothesis so that there exists a best of all similar regions. 

Extensions of the existing theory based on the first method are obtained in 
section 2, and the theory is applied in section 3 to a hypothesis concerning a mul- 
tivariate normal distribution. Sections 4 and 5 are concerned with applications 
of the second method to problems to most of which the earlier method is not 
applicable, in particular to hypotheses concerning exponential and rectangular 
distributions, hitherto only treated from the likelihood ratio point of view. 


2. On the theory of optimum tests. 

2.1 One-sided tests. In an interesting paper [4], Scheffé determined the type 
B and type B, tests of a certain class of composite hypotheses specifying the 
value 6 of a parameter @ in the presence of nuisance parameters. 


Scheffé’s results can, in an obvious way, be extended to cover one-sided sets 
of alternatives. To show this, consider the method used in [4]. Under certain 
assumptions all tests’ are found which satisfy the two conditions: 


2 The terms ‘‘the test w’’ and ‘‘the region [of rejection] w ”’ will be used interchangeably. 
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(a) The power function 6» at 4 has a preassigned value « (the level of signifi- 
cance), independent of the nuisance parameters; 
(b) the power function at 6 has derivative 0. (Condition of unbiasedness). 
Then that test wo is determined for which, of all those satisfying (a) and (b), 
(c) the second derivative at % , B.(@), is as large as possible. 
By definition wy is a type B test. Under a certain additional assumption (this 


oO, 
is the convexity assumption Oy > 0 of Scheffé’s Theorem 2) it is shown that of all 
1 


tests satisfying (a) and (b), wo has maximum power against all alternatives, 
i.e. is of type Bi. 
If now we want to maximize the power against only the one-sided set of alter- 
natives, 0 > 4), we determine that test w; of all those satisfying (a), for which 
(d) the first derivative at % , 8,(00), is as large as possible. 
Under a certain additional assumption (in Scheffé’s notation this would be the 





0 
monotonicity assumption - > 0) it can then be shown that of all tests satisfy- 
1 


ing (a), w: has maximum power against all alternatives 6 > 6, (it also has 
minimum power against all alternatives 6 < 6), i.e. w: is uniformly most power- 
ful against alternatives 6 > 6 ,. We shall not carry through the discussion 
in detail since Scheffé’s argument applies step by step, with only the obvious 
changes. 

2.2 Determination of the boundaries. Let X,,--- , Xn be n random Variables 
with a joint probability density function p, depending on parameters 6; and @ = 
(02, +++, 6:). We shall denote the probability density function of a set of ran- 


dom variables X,, --- , X, whose distribution depends on a parameter @ by 
p(t1,°**, %,| 8) or simply by p(x, --+., 22) when the dependence on @ is 
clear from the context. The set of points (a, --- , %n) for which 


P(t1, °°: » In| 8) 


is positive we shall denote by W4(@). 
Let 


0 ' 
(2.1) Gilt °°, Tn) = ae log p(ai, +++, tn | 615 8) loners (@@ =1,---, J), 


and let the random variable ®; be defined by 
(2.2) ®; = ¢i(Xi,°°+, Xn). 


Then for testing the hypothesis H: 6; = 61, under the assumptions stated by 
Scheffé, the type B, test wo is defined by the inequalities 


(2.3) gai<hi,> ke (ky < ke) 
where k; , kz depend on 6}, 8, ¢2, --: , ¢: and are determined by the two equa- 
tions® 
ko ° 
(2.4) [ giver, +++, gr) dg = (1 — | same (s = 0, 1). 
C4 — OO 


3 Although k; and k. may depend on @, wo is independent of 0, as was shown in [4]. 
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The equations (2.3) and (2.4) are not suitable for the determination of the 
boundary of wo. The variables have to be transformed so as to obtain for 
Wo an expression from which the calculation of the boundaries becomes feasible, 


(ef. [9]). This part of the work may be formalized in the following theorem. 
THEOREM 1. Let 


U fh suea*** 7) 
(2.5) 


Vi = gi(e, +++ ,), (¢@ = 2,---,), 


be a system of functions, continuously differentiable and with non-vanishing Jaco- 
bian almost everywhere, and such that 
(i) U is a linear function of 


(2.6) U = ab, + b 


with coefficients which may depend on ® , «++ ,®,and such that* a(@, , «++ ,®:) > 0; 
(ii) zt 2s possible to solve for B, , --- , &; in terms of the V’s, 
(iii) under the hypothesis H, U is distributed independently of 


VY = (¥;,°-+, Vo. 
Then the region wy is equivalent to the region 
(2.7) et <. C1, > C2 (ec, < C2) 


where C, , C2 are determined by 


(2.8) [ u’ p(u) du = (1 — e) [ u® p(u) du 
PROOF. 
| 9(u, v2, -*+ 2) 
| Ogi, +++, gu) 
ou . | O(v2, cee ae v1) 


ri ie 


Digi, 2, °**, G1) = P(U, %, +++, 0d) ° 
(2.9) 


But 


alge, -*+, 9 * gi t We, -+: , 90) 
(2.10) 


a(ve,*-: » V1) -gi t Blur, sani » %) 
so that (2.4) reduces to 


c2(v2,°°*, 01) ( —B 
a 


€y(¥9,°°*,02) 


) p(u)p(ve, +++, v2) du 


(2.11) “ , 
= (1 —e) [ same (s = 0,1) 


4 A similar theorem holds when we assume a(@,,..., #) < 0. 
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and hence to 


colv2,-++,02) wo 

(2.12) | : u’ p(u) du = (1 — )| same (s = 0, 1) 
€1(¥Q,°°°.07 — 00 

which shows c; and c; to be independent of the v’s. Also obviously (2.3) trans- 

forms into (2.7) which completes the proof. 

If U is such that its distribution (when 6, = 69) is independent of 6, c; and ¢2 
of theorem 1 will depend only on the data of the problem: ¢,n, 6{. However, the 
existence of constants c; and c satisfying (2.8) still has to be proved. We may 
show more generally the existence of k; and kz satisfying (2.4). A proof is im- 
mediately supplied by an argument which was used by Neyman [10] and Wald 
[11] to prove the existence of type A tests, and which may be stated in the 
following 


Lemma. LetO <a < 1, let f(x) > Oand | x'f(x) dx < © fors=0,1. Then 
there exist A, B such that 


(2.13) [- a’ f(x) dt =a [ x’ f(x) dx (s = 0, 1). 





3. Testing for circular serial correlation in a normal population. We now 
apply the results of the previous section to obtain the optimum tests (i.e. uni- 
formly most powerful against the one-sided set of alternatives, type B, in the 
two-sided case) for the hypothesis specifying the value of the circular serial cor- 
relation coefficient in the normal population considered by Dixon [12]. (For 
the literature on testing for non-circular serial correlation in normal populations 
ef. [12]). 
We assume 

(3.1) p(t, +--+, a) = (/2n0)* exp | - Qo? SS (a: — &) — S(t — &)] | 
where 2n41 = 2; and |é| < 1, and we test the hypothesis 6 = 6). For testing 
purposes only the value 5) = 0 is of interest presumably, however, the family of 
tests for arbitrary 5) is required for estimating 6 by means of confidence intervals, 


and therefore the more general hypothesis is considered. 
Making a transformation in one of the parameters we write 


p(n , Pa Sa) 
= C(5, a) exp a | a + 8’) p> (x; — §)* — 252, (@ — §) (Gun — || 


where in the notation of the previous section 0; = 6, 02 = a, 63 = &. 
THEOREM 2. For testing the hypothesis 6 = 4 for the distribution (3.2) 
(a) the type B, test exists and is given by 


(3.3) 


(3.2) 





r<n,> Te 
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where 
ZL (a; — 2) (Gin. — Z) ix 
(3.4) T= ————___ aie : zt = : Z; 
Z (1; — #)° _ 
t=1 


and where r; and rz are determined by 
r2 r 8 } 
(3.5) [ i a =z) iwHna~~ [. same (8 = 0,1). 


(b) the uniformly most powerful similar region for testing H against the alter- 
natives § > dy exists and is given by 


(3.6) r>r 
where r’ ts determined by 
(3.7) [ rp@a=a-0[ poart 


~Proor. We compute 
v1 = Ci(do, a) + aldo2(x; — £)? — U(ai — (inn — 4] 

(3.8) ye = Ca(do, «) + (1 + 5) 2(as — £)° — 2WoB(as — (tins — 8) 
gs = — 2na(l — 6)(z — £). 


There is no difficulty in checking the conditions of Scheffé’s theorems [4]. 
Next we apply Theorem 1 of the previous section, and define 


Ve = (1 + 65)2(X,; — X)? — 262(X; — X)(Xins — X) 
(3.9) Hw « s 
U = =(Xs é — XY) Xn ~ m) 
V2 


Conditions (i) and (ii) of Theorem 1 are easily seen to be satisfied. To show that 

U is independent of V = (V2, V3) we employ arguments which have recently 

been used by various authors in a number of similar problems (ef. [13, 14, 15)). 
It is seen that an orthonormal transformation exists: 


X,,«+,Xe~hi, ~+,. ¥. 
such that 


VnX 
(3.10) a (X; — X)(Xir — X) 


2. (x; ~ Xf «= >. ¥%. 
i=1 2 


5 A corresponding result holds for the other one-sided case. 
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Under H the Y’s are distributed with probability density 


i me. «COS E kz - vn td Al 


where k, we , «++ , #n depend on 4) and where the w’s are all positive. Introducing 
new variables 
(3.12) Z;= V ii Y;, (2 = 2, a n), 
and, then, generalized polar coordinates in the space of the Z’s, 
(3.13) R = > zs ? VW, aaa WVu-2 

t=2 
we see that Y,, R and Wy, --- , ¥,_. are completely independent. Also 


1 
V2 = r. V3; = a/n (Yi aa £) 


while U, being homogeneous of degree 0 in the Z’s, is a function of the W’s only. 
This proves that U, V2 and V; are completely independent. The type B, test 
of H is therefore given by 


i (2; — £)(Xin1 — 2) 
(3.14) w= ————_— <1, > Ce 
(1 + 65) a (x; — #)* — 2502, (x; — £)(Xin1 — 2) 
where ¢, and c, are determined by 
ce co 
(3.15) [wpe du = (1-6 [upd du (s = 0, 1). 


We still have to show that this test is equivalent to the one defined by (3.3) 
and (3.5). For 6 = 0 this is trivial. Let us assume 6 < 0. (The other 
case goes through similarly.) The inequality u < c; is equivalent to 


(3.16) (1 + 28oe1)2(as — %)(tinn — Z) < (1 + 85) 2a, — 3” 


and hence to 





.17) ae <w 


1 
provided 1 + 2c;5) > 0. Suppose 1 + 25) < 0, i.e.c, > — 25, Then® 
0 


(3.18) P{U <a} > piu < ~ 55} = P{0 < 3(X; — X)} =1 


6 We denote the probability of an event A by P (al. 
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ie. P{U < c} = 1 which would contradict (3.15). Similarly if 1 + 259 < 0 
we would have P{U > c} = 0 and hence our test would be one-sided and there- 
fore not unbiased. The inequalities u < c,, > c2 are thus equivalent to the 
inequalities r < 7, > 72 and since 
r 

rr 
1 + 69 — 2dor’ 
(3.5) also follows. 

The existence of type B, and uniformly most powerful one-sided tests of the 
hypothesis H is rather surprising. For when a and £ are assumed known, neither 
the type A; test nor the uniformly most powerful one-sided test of the simple 
hypothesis H’: 6 = 6 exists. This is easily seen by determining the most 
powerful and the most powerful unbiased test against a specific alternative 6, 
for the hypothesis H’ in the population 

be (L + 8)ze? — 26Ez, cell 
(3.19) p(x, SrA ~ (4/2n exp [ —31( + 5 )2a; — 2627; 2441)I. 


The distribution of the criterion 2 was obtained by R. L. Anderson [16] (see 
also [17]) for the case 6 = 0. Madow [15] using Anderson’s result found the dis- 
tribution for arbitrary 6. (Approximations to the distribution have been studied 
by various authors; for the literature on this cf. [18]. Recently Hsu [19] ob- 
tained an asymptotic expansion.) A direct derivation for arbitrary 6 may be 
based on the following theorem of Cramér, which was communicated to the 
author by Dr. P. L. Hsu. 

THEOREM 3. (Cramér)’. If X, Y are two random variables, (not necessarily 
tndependent), Y > 0, then 


x). * elt) = — 4) 
(3.20) Py < x} es -~[ at 


where gz and W are the characteristic functions of X — xY and Y respectively, 
provided 


(3.21) [ an ue = dt < x. 
THEOREM 4. If 
1 — 6 


p(x, a Zn) os (\/2n0)" 
(3.22) 


1 n 
: exp | - 20° 2» l(a; — —) — 6 — ar (Tn41 = 2) 


7 Differentiated forms of the theorem were given by R.C. Geary [Jour. Roy. Stat. Soc. 
Vol. 107 (1944) p. 56] and H. Cramér [Exercise 6 on p. 317 of Mathematical Methods of 
Statistics. Princeton Univ. Press (1946)]. 
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a ee 
(3.23) Sip Ti arcenapreiesiorsncenctcibi 






n ’ 


> (X; — X’) 


t=1 







onti2 Sm 8” 
Pe > + aT 
(3.24) 


(—1) (A; — r)”*? sin JT sin 2m 
n n 


; 1 + & — 20, 
where the summation is extended over all integer j, 1 <j < > for which \; > r, and 


where 





(3.25) \; = 2 cos ous ; 









The proof of this theorem from Theorem 3 is straightforward and only will 
be indicated here. If X and Y denote the numerator and denominator of R 
respectively, the characteristic functions of Y and X — rY may be obtained by 
the method of circulants (cf. [12, 17]). The integral on the right hand side of 
(3.20) is then easily evaluated by the theory of residues when n is odd. In the 
case that n is even, the integrand has two branchpoints, one in the lower and one 
in the upper half plane. These may be separated, and then again the method 
of residues may be applied. 


























4. Similar regions. The problem of finding all regions similar to the sample 
space with respect to a parameter @ was solved by Neyman and Pearson [1] for 
a certain class of probability laws. In a later paper Neyman proved ((20] 
proposition IX) that if there exists a sufficient statistic T for a parameter 8, 
then w is similar with respect to @ if it has the following structure: For the inter- 
section w(t) of w with the surface 7’ = ¢, the relative probability of w(t) given 
7 = thas a constant value independent of ¢. We shall show in this section that 
for a large class of probability laws which admit a sufficient statistic for @ the , 
regions with the above structure are the only ones that are similar with respect 
to @. 
We consider samples from a univariate distribution and we distinguish three 
cases as one, both or neither of the extremes of the range of the distribution 
depend on the parameter @. For the first of these cases (cf. Pitman [21]) we con- 
sider samples from a distribution with probability density 








482 E. L. LEHMANN 


(x) 

(4.1) (zt) =I, ka) <a So, 

Pi) = 00) , 

where k(@) is a strictly monotone continuous function of 6 and where c may be 
infinite. Introducing a new parameter 6 = k(@) the distribution of a sample 
from (4.1) is given by 


(4.2) P(%1,°*+,%n) = fas) ie » §<FaSe. 

To obtain the totality of regions w similar with respect to 6 let us denote by 
W.1,--:, W, the portions of the sample space where the smallest of the 2’s is 
t,°**, %, respectively. For any region w denote by wy, the intersection of w 
with W;. Consider a transformation carrying W2,--- , W, into Wi, letting 
yi = min(2,--- , Zn) and letting in W; : 








(43) yo = %1, Ys = Ya, °°?» Yu = Te-r, Yuri = Te41,°°* > Yu = n> 


Denote by w; the image of w; under this transformation. The condition that 
w be similar with respect to 6, 


flm)-+-fa),  , © 
(4.4) [ ~ b() dx, aXn — €, 
may be written in the form 


“ fy) f P \ 


(8) c £7 : ( 


where W(y;) denotes the region y; < y; < c, (i = 2, --- , n), that is, the region 


of variation of yz, --- , Yn given y; , and where w;(y:) denotes the region of vari- 
ation of y2, +--+ , Yn given y; and w,. From (4.5) we obtain 
: ce (6) 
4.6 —— | dy, =0 
where 


vin) =f Fue) ++ flys) dye +++ dyn 
k=l Sw (yy) 
(4.7) ? ; 
— nef + | flys) ++ $n) dye ++ dyn. 
V1 V1 
But (4.6) implies 
(4.8) ¥(y1) = 0 almost everywhere 


and since we can only determine w up to a set of measure 0, we may omit the 
qualification in (4.8). Therefore a necessary and sufficient condition for w to 
be similar is 
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1 n 
(4.9) eres 2 fs) +++ fn) dys+++ dyn = 
n| [sora = Lo 


for all ym. 

To see more clearly the structure of these regions, let us taken = 2. Equa- 
tion (4.9) states that on each of the broken lines of Fig. 1 the relative probability 
of w = w; + w; given Y; = y; is e, where the decomposition of this probability 
into its two components may vary with »: . 


_— 






R 7¥, =Y, 


x; 


Fie. 1 


In general equation (4.9) states that on each hyperplane Y; = y; the relative 
probability of w is independent of y,. Since Y: = min (Xi, --- , X,) is a suffi- 
cient statistic for 6, Neyman’s theorem in this case does give all similar regions. 

Next let us consider the case where both extremes of the range of the distribu- 
tion depend on the parameter. We shall assume (cf. [21]) that X,,---, X, 
are distributed with probability density 


_f@) . 
(4.10) p(x) = (6) in @< 2 < b(6) 
where 6 is a strictly decreasing continuous function over an interval [— , 
b(— )] and where b[b(— ©)] = —o«. These assumptions insure that there 
exists a unique number a, — © <a < b(—~), such that b(a) = a. 

Denote by Wi;;, (7, 7 = 1,---,;7 # Jj), the portion of the sample space 
where the smallest and the largest of the x’s are x; and 2; respectively. Denote 
by Wij, and W,;2 those portions of W;; where 2; is greater than and less than 
b'(x;) respectively. For any region w denote by w;,;x the intersection of w with 
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Wi. Consider a transformation carrying the sample-space into W,, , letting 


Yi. = min (4%, °++ , tn), Yn = Max (%1, --+ , Zn) and in W,; letting yo, --- , Yay 
denote the remaining x’s in the order of their subscripts. Next make a trans- 
formation carrying W, into Win, letting 2, = max [y:, 0 '(y,)], 2n = min 


[y., b (yn)] and z = y, fork = 2,---,n—1. Denote by wij the image of 
Wy ik in Win. 

Then Z, is a sufficient statistic for 6 (cf. [21]) and there exist functions fi , g; 
such that the density of Z, is given by 


fi (2n) ° 
(4.11) (z,) = “— in 0<2,< 4a 
gi() 
while the distribution of the remaining Z’s given Z, is independent of 6. 


The condition that w be a similar region may now be written, analogously to 
(4.5), in the form 


6 a 
fi(Zn) | fil2n) 
(4.12 = 2 Z15°°°* 5 Snm | 2n) 2 °** dén-1 dz, = ——~ d2pn 
) a gil) ijk Ywezk (Zn) P , | i : : ™ ; 6 gi(0) 


and hence by the argument which led to (4.6), as 
(4.13) > | Per, °** 5 Sn-1 | on) dz, +++ dzny = € for all z,. 
ijk Yw5je (zn) 


Thus in this case also Neyman’s theorem gives the most general similar region. 

For the case that neither extreme of. the range of the distribution depends on 
the parameter 6, it has been shown by various authors [22, 21, 23] under slightly 
varying assumptions concerning the regularity of the distribution function, that 
the existence of a sufficient statistic implies 


(4.14) p(x | @) = exp [P(6) + T(x)Q(@) + R(z)]. 


This (cf. [10]) is a special case of that for which Neyman and Pearson determined 
the totality of similar regions, however under the restriction that the moments 


of 6 = 5 >, log p(X;) uniquely determine the distribution of ®. We shall 
t=1 


briefly indicate how this assumption may be avoided. 

Let Xi, --- , Xn be a sample from (4.14), or, more generally, (this is the case 
considered by Neyman and Pearson), let X1, --- , X, be distributed with prob- 
ability density 

p(x , ee » Zn) 
(4.15) 
= exp [P(@) + U(X , pre In) Q(4) + v(x oo Xn) | 


in a sample space W4 which is independent of @. We shall assume that the set 
of values which Q takes on contains at least some interval. Introducing 6 = 


Ss a 


t 
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—Q(6) as a new parameter, we shall obtain all regions similar to 6 (where the 
set of values of 6 contains an interval) for the distribution® 


P(t, +++ , Xn) 
(4.16) 
= exp [pi(6) — 6+ u(ti,-++, tn) + v(t1,-+-, Fn)] 
, “. (dau? , 
under the assumption that >, (2*) ~ 0 except possibly on a set of measure 0. 
t=1 i 
Let us for a moment assume that there exist functions f,(r7;,---, xn), 


(¢ = 2, --- , ), with continuous partial derivatives almost everywhere and such 
that the transformation 


(4.17) "1 = u(x, eae » Meds ¥; = fila eae Ma), (¢ = 2, rarer > 2), 


is one to one on W, except possibly on a subset of measure 0. Applying this 
transformation we may write the condition of similarity in the form 


[ ional [ ( Jy 5 ES Yn) dye cen dyn: dys 
l— 60 wy 
(4.18) 


= ¢ | groin [ tM, tee, Yn) dy eee dyn: dy; 
— 20 W (yi) 


where W(y:) denotes the region of variation of y2, +++ , Yn given y; , and where 
w(yi) denotes the region of variation of yo, -++ , Yn giveny; and w. Furthermore 


f(y, °** 5 Yn) is independent of 6. From the theory of bilateral Laplace trans- 
forms it is known that (4.18) implies that 


(4.19) | FYr,°°* 5 Yn) dy2+ ++ dyn = e{ F(Yr,+** 5 Yn) dy2+++ dyn 
w(y1) W (yi) 


which is the desired result. 

More generally it may be shown that our assumption concerning u(21 , +++ , Xn) 
insures the existence of functions f; , (¢ = 2, --- , m), such that under the trans- 
formation (4.17) no point (y: ,--- , yn) has more than a denumerable infinity of 
counter images in z-space. Our proof can be modified to cover this case. The 
argument is similar to that used to obtain equations (4.9) and (4.13) which were 
also arrived at through many to one transformations. 


5. Testing exponential and rectangular distributions. In their fundamental 
1928 paper [24] on likelihood ratio tests, Neyman and Pearson discussed various 
hypotheses relating to normal, exponential and rectangular distributions. Later 
they and other authors developed a theory of similar and bisimilar regions which 
made it possible to obtain optimum tests of many composite hypotheses with 


8 An assumption that we can solve for @ as a function of 6 is not needed since we can 
determine P, (6) by integrating the density (4.16) over W,. 
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one constraint concerning normal populations. This theory however is not 
applicable to most hypotheses concerning exponential or rectangular distribu- 
tions. We shall in this section obtain optimum tests of some hypotheses relating 
to these latter distributions, using the method of the previous section. 

Let us first consider a sample X,, --- , X, from an exponential population, 
the probability density of the sample being. 


(5.1) p(t1,°*+,2n) = Xexp| -1 3 (x; — | ifz;>b, ( =1,-++,n) 
t=1 


and let us consider the two hypotheses Hi:a =o , H2:b = bp where, without loss 
of generality, we shall take a = 1, bb) = 0. The likelihood ratio tests of both 
these hypotheses were shown to be completely unbiased by Paulson [25]. We 
shall prove 

THEOREM 5. The likelihood ratio tests of H, and Hz are type B, and uniformly 
most powerful, respectively. The one-sided tests based on the likelihood ratio criterion 
for H, are the uniformly most powerful one-sided similar regions for testing this 
hypothesvs. 

Proor. In order to simplify the argument we shall give a detailed proof only 
for the restricted class of tests which are symmetric in the variables X; , --- , X,. 

For testing H; let us make the following transformation introduced by 
Sukhatme [26]: 


Zi = ny, 


(5.2) 
4; 


(n —2 + 1)(Y; =— Y:-1), (i - 2, Pane ,n), 


where Y; is the 7th of the X’s in order of magnitude. Then 


1 1 le 
p(a:,°**,2n) = X exp| -} — nb) 1d. 


(5.3) 


if 2, > nb;2z;>0 (4 = 2, --- , n). 


We want to determine all regions w which under H are similar to the sample 
space with respect to b, i.e. all regions w satisfying 


n “a 
—(s;—n8 
[ e 1—-™) axn | - y =| dza+++ dz, dz 
w 1=2 


<0 ( 2 
tg 4B) 
| ” Seni / exp| - | dzg+++ dznp da 
nb \ w (z1) t=2 
(b) = (b) ad 
=e=e | or 
nb 


where w(z;) denotes the intersection of w with the hyperplane Z; = z,. Now 
(5.4) is equivalent to 


(5.4) 


oe 


> OW &@R A. 
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. -) (b) 
(5.5) , e” | e "' f(z;) dzy = 0 
nb 
where 
(5.6) fla) = | exp | = Zz. | dzg+++*d2n — € 
w(2,) i=n2 


and this in turn is equivalent to 
(5.7) f(a) = 0 for all z,. 


Of all the regions w satisfying (5.7) we want to determine the one which against 
a specific alternative, say a; , has maximum power, i.e. for which 


(5.8) I fare / exp | — Zz | dze+++ dz, dz 
: nb w (z,) 


1 t=—2 


is as large as possible. We thus see that w will have the desired properties if 
w(z:) is determined according to the two conditions 


(5.9) [ exp | - Zz, | dzy+++ din =€ 
w (21) 


t=2 
and 
(5.10) | exp | =. Zz. | dz2+++ dz, = max. 
w (21) Q1 im? 


Hence by the Neyman-Pearson fundamental lemma w(z2:) is the set of points 
satisfying 


(5.11) exp l(- i>.» bx) | > Clay, 21) 


Q, ix2 t=n2 


and therefore according as a; is greater or less than 1, w(z:) is determined by 


zi = > lx; — min (%, -*+, %n)] > (a, 21), or 
im? ra 


(6.12) - 
2 ie = 2 Ir we min (x1, eee » ta)] < k’ (a, 21). 


t=2 


But >> Z; is independently distributed of Z; and under H the distribution of 
t=2 


Z; does not depend on a, in fact it is a chi-square distribution with 2n — 2 
2 


degrees of freedom. Thus k and k’, as determined by (5.9) are independent of 
a, and the two tests (5.12) are uniformly most powerful one-sided. 

Next we consider the more restricted class of unbiased similar regions. For w 
to be unbiased we must have 
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d =| Zzi—nb| _ Is |, aa 
= a ‘exp | . Jew] > 2 | dz, icy} | al 


(5.13) = [e — nb — n)exp[— (4 — nd)] | exp| - 


w(z a 


+ | exp [— (2: — nb)] (> :) exp | - z. | dzz-++ dz, dz, = 0. 
nb w(z,;) \t=2 


t=2 


n 
=| dz. +++ dz, dz, 
=2 


The first of the integrals in the middle member equals 


| (e — n) ef exp | - Zz =| dz, +** 
0 w(2+nb) i=2 


(5.14) . 
=e I (g —n)e* dz = —(n — lye. 


Therefore 


0 n . 
—(z,—nb 
[cor [(X a) ep| — E ecf dene ae a 
nb w(21) \i=2 t=2 


(5.15) " 
= (n — lle = (n —- ie [ om de 
nb 


or 
(5.16) / . e o(@1) dz, = 


where 


(5.17) g(a) = t. (x ::) exp | - «| dzy +++ dzn — (n — l)e. 


t=2 


Thus finally the condition of unbiasedness reduces to 
(5.18) I. & 2) exp | - > | dz, +++ dzn = (n — le 
w(z1) \i= det 

and we seek the region w(z,) which satisfies (5.9), (5.10) and (5.18). 

By the fundamental lemma w(z:) is given by 
(5.19) exp | - 1 ya > [ cua, 21) ye + C2(a, a) | > exp E ya 
which is equivalent to 
(5.20) > zi < ki(ai, a), > ke(ai, 2) 


where k; and ky are determined by (5.9) and (5.18), and are therefore independent 
of z, and a. Thus the region (5.20) which of all unbiased similar regions 
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maximizes the power against the alternative a = a; is independent of a; and 
hence is a region of type B,. This completes the proof since it is easily verified 
that (5.10) is equivalent to the likelihood ratio test. 

The proof for regions which are not necessarily symmetric in the variables 
follows similarly if instead of the transformation (5.2) one uses a transformation 


U; = fiX1, +--+ , Xn) which is one to one and such that U; = Z;, U2 = >> Z;. 

¢=2 
The distribution of U3, --- , U, is then independent of a and b since U, , U2 
are a pair of sufficient statistics for these parameters, and the proof carries over 
step by step. 

Next we consider the hypothesis H.:b = 0, and again we restrict ourselves to 
regions which are symmetric in the variables, although as before the proof can 
be modified to cover also nonsymmetric regions. 

We first make the transformation to Z,,---, Z, given by (5.2). In the 
n — 1 dimensional space of Z2,--- , Zn, we then transform to new variables 
U,V, +--+ , Yn-2 where U = 2 Z,; and where the W’s are the generalized polar 
angles. Obviously the distribution of the Y’s does not depend on a, since they 
are homogeneous of degree 0 in the Z’s. Furthermore the W’s are independently 
distributed of U since the probability density of the Z’s is constant over the 
hyperplanes U = u. Thus 


K 
p21, Uu, Vi, bees » Wn—2) os a” 


exp | - nal e we P(r ,oee 


We next introduce new variables 


(5.21) 


(5.22) V=2Z,+ UandT = Z, : at 


and find 


v — nb 


| v* (1 — t)" pi, «+: 


K 
pr, t, Yi, ve Wn—-2) —_ a” exp _ 
(5.23) 
for v > nb, <s<, 1. 


For w under H; to be similar with respect to a, we must have 


4 


| - 4 . | (1 ane t)”* pi, vor > Wn—2) dt dy, se dWn—2 . dv 
a wo(v) 


=<[ ex | - 2 | ay 
p a” a a 
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where w(v) designates the intersection of w with the hyperplane V = »v, and 
where wo(v) denotes the part of w(v) lying between the hyperplanes ¢ = 0 
and ¢ = 1. 

Hence the condition of similarity may be written as 


(5.25) | exp | - 4 v” fv) dv = 0 for alla > 0 
0 a 


where 
(5.26) fv) = | at TO Was ++ Vans) dt fa +++ dans — € 


By the uniqueness theorem for Laplace transforms, (5.25) implies f(v) = 0 
for all v > 0, so that the condition of similarity finally reduces to 


(5.27) os (1 — t)"* pia, +++, Wn2) dt dr +++ dno = €. 


Of all similar regions, let us find the one which has maximum power. Obvi- 
ously we want to include in w(v) all points for which t < 0. In addition we want 
to choose w(v) such that 


6.28) =| (EHO Was 2+ 5 Hana) dl da +++ dae = max 
wp(v 


where w»(v) is that part of w(v) in which max (0 : #) « 4. 


If, for some alternative b, wo(v) is contained in = < t < 1, then w(v) and 
wo(v) coincide and hence (5.28) attains its maximum value e whatever the posi- 
tion of wo(v) in ~ <t< 1. If on the other hand ~ is so close to 1 that 


= < t < 1 is too small to contain w(v), then (5.28) attains its maximum for 


any w(v) containing . <t<1. There exists therefore a unique wo(v) which 
maximizes (5.28) for all values of b and v, namely the region defined by 
(5.29) Civ) <t <1 


where C is determined by (5.27). 
Since under H, , the statistics V and 7’ are independent, C does not depend 
on v. The test 


(5.30) $s G, >C 








which we have just shown to be uniformly most powerful, is also the likelihood 
ratio test which completes the proof of the theorem. 
We shall finally consider an example of an optimum test in connection with a 
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rectangular distribution. Let X,,---, X, be independently and uniformly 
distributed over (a, a + 6), where @ is positive. . For testing the hypothesis 
H: a = qa, the test 


Yi — 4% 
(5.31) rose 2¢ 
where Y, and Y, are the smallest and the largest of the X’s respectively, is the uni- 
formly most powerful of all similar regions. 

The proof of this goes through-very much like that for H. in Theorem 5. 
Without loss of generality we take ag = 0. Also again, to simplify the proof, 
we restrict ourselves to regions which are symmetric in the variables. We need 
the following lemma. 

LemMA. Let X,,°+--, Xn be independently and uniformly distributed over 
(a,a+ 0). Let Y; denote the ith X in order of magnitude, and let 
Yi. 


(5.32) T, = Yn, Tr = =~; l,---,n—1). 
Yeu 


Then fora > 0 


! 
(5.33) Pl, +++, ts) = mo G3... 


when 


a 
54.4644 0rr—— £4 ¢ i (k =1,---,n—1). 
tn*tn—1 on tei 
This is easily seen by applying the usual method of Jacobians. The inequali- 
ties describing the sample space of the T’s are equivalent to the following more 
convenient ones: 


a 
(5.34) GS Sat 6,7 Sth:: ta Sljt <1, (kK=1,---,n—1). 


Let us denote by w(t,) the intersection of a region w with the hyperplane 
T, = t,, and by wo(t,) that part of w(t,) contained in the cylinder 0 < & < 1, 
(k = 1,---,m — 1); then we find as a necessary and sufficient condition for 
w to be similar with respect to 6 (assuming H) 


(5.35) (n — 1)! | Cae s+: dn +s & Oe 
wo(ty) 


Of all regions satisfying (5.35) we want to find the most powerful one. Let 
us first consider alternativesa > 0. If wa(t,) denotes the common part of wo(tn) 
and the region 


a 
(5.36) i. S tuse-e**+& S 1, 


we must choose we(t,) such that 
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(5.37) / baat : oe ee? ty dtn—1 et dt; = max. 
waltn 


From this it follows easily that against alternatives a > 0 the uniformly best 
choice for wo(t,) is 


2 
(5.38) lite eo | = 2 = C(t), 


y 


Y 
and since under H, — is independently distributed of 7, , C’(t,) dces not depend 


on f,. 

Consider next alternatives a < 0. We include in the region of rejection all 
points for which Y; < 0. To determine wo(t,) we notice that, given Y; > 0, 
the X’s are uniformly distributed between 0 anda + 6. (Provided a + @> 0; 
the case a + @ < OQis trivial). Hence the probability distribution of the T’s 
given Y; > 0 is 


n! n—1l 


(5.39) pli, —o y tal Yi > 0) = a+o"™ 


- to 


when 


0<t,<a+0, O<&%<1 


Thus 


pth, oe Bins bfas a <0, Y; > 0) 
Ph, «oop hn |, @ = 0) 


is independent of t,, --- , ¢n-1 and hence the power of w against alternatives 
a < Ois independent of the choice of wo(t,). Therefore the region 


(5.40) 





(5.41) yn < 0, : > ¢' 


is uniformly most powerful against all alternatives. But (5.41) is equivalent to 


(5.42) nn SE > C. 
Yn set Y1 
It is interesting to compare this result with that for the corresponding simple 
hypothesis. Let H’ be the hypothesis: a = 0 when the X’s are assumed inde- 
pendently and uniformly distributed over (a,a@ +1). There exists no uniformly 
most powerful test of H’; instead the two uniformly most powerful one-sided 
tests exist. By analogy with the normal case one might then expect for H’ 
that of all tests with symmetric power-functions, there be a uniformly most 
powerful one. This however is not so: there exist infinitely many admissible 
tests with symmetric powerfunction. 
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In this and the previous section we restricted ourselves to problems involving 
only one nuisance parameter. However, the method applies also to problems 
involving several nuisance parameters. 

In the usual way (cf. (20, 9]) the results of this section may be translated to 
give optimum sets of confidence intervals for estimating the parameters in ques- 
tion. In this connection it is an open question whether the confidence regions 
based on the type B, tests discussed in section 2 will always be intervals; one 
would expect this to be the case. 

The author wishes to acknowledge his indebtedness to Professor P. L. Hsu 

for many helpful suggestions. 
Added in proof: In a joint paper by Professor Henry Scheffé and the present 
author which has been submitted to the Proceedings of the National Academy of 
Sciences, a result is given concerning the existence of certain 1:1 transformations. 
This result bears on Section 4 of the present paper where a question arises con- 
cerning the existence of a 1:1 transformation. The existence of such a trans- 
formation is now assured and, as a consequence, the last paragraph of Section 4 
has become superfluous. 
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A CORNER TEST FOR ASSOCIATION 
By Paut S. OLMsTEAD AND JOHN W. TUKEY 


Bell Telephone Laboratories and Princeton University 


1. Summary. This paper proposes a new test (the “quadrant sum”) for 
the association of two continuous variables. Its notable properties are: 

(1) Special weight is given to extreme values of the variables. 

(2) Computation is very easy. 

(3) The test is non-parametric. 
Significance levels (for the quadrant sum) are given to the accuracy needed for 
practical use. To this accuracy they are independent of sample size (see Fig. 1). 
The generating function of the quadrant sum is given for the null hypothesis 
(no association = independence). A limiting distribution is deduced and com- 
pared with the cases 2n = 4, 6, 8,10, and 14. Extension to higher dimensions 
and application to serial correlation are discussed. 


2. Description of test (even number in sample). We shall describe the 
test as though a scatter diagram had already been drawn. The possibilities of 
direct computation from tabular data are indicated by the examples in sections 
8 and 9. 

In the scatter diagram, draw the two lines, x = 2m, y¥Y = Ym, Where 2» is the 
median of the z-values without regard to the values of y, and y» is the median 
of the y-values without regard to the values of x. Think of the four quadrants 
or corners thus formed as being labelled +, —, +, —, in order, so that the upper 
right and lower left quadrants are positive. Beginning at the right hand side 
of the diagram, count in (in order of abscissae) along the observations until 
forced to cross the horizontal median. Write down the number of observations 
met before this crossing, attaching the sign + if they lay in the + quadrant, 
and the sign — if they lay in the — quadrant. Repeat this process moving 
up from below, moving to the right from the left, and moving down from above. 
The quadrant sum is the algebraic sum of the four terms thus written down. 
This process is illustrated in Fig. 2, where the black dots represent contributions 
to the sum, and the dotted lines, crossings. 

When there are an even number of pairs (x, y) and no ties, the medians will 
pass between the points. In this, the simplest case, the distribution of the 
quadrant sum is known for the hypothesis of no association (that is, of inde- 
pendence), and significance levels are given in Table 1 for the magnitude (abso- 
lute value) of the sum. It will be noticed that the sample size does not enter in 
any important way. 

The cases of an odd number of observations and of ties are discussed in the 
next two sections. Simple devices make the test usable in most cases. A very 
great tendency toward ties, however, will make it inapplicable. This will be 
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unimportant in most applications because of the fact that attention is being 
directed to the periphery. 





INDIVIDUAL TERMS QUADRANT SUM 
TOP = +3 IS|= 16 Ye 
RIGHT = +1 P = 0.5% 
BOTTOM =+6 
LEFT =-+6'/o 

---¢---- OD sist tes a ae O 
Oo Oo O oO 
| * a Oo ” | 
b O 0 °° | 
| oO | 
| © Do & Oo S ° ;e 
| | 


y-MEDIAN 


x- MEDIAN 


Fig. 2. Scatter diagram of 116 pairs of observations 


The set of data which prompted the development of the test is shown in Fig. 2. 
The accompanying report described it as follows: ‘“The various points appear 
to be scattered almost completely at random and give little indication of corre- 
lation.”” The quadrant sum is 1614 which is significant at the 0.5% point. 
Intuitively, the significant association of the peripheral points is clear. 
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3. Description of test (odd number in sample). If the sample size is odd, 
then we may usually follow the process outlined above. We will have difficulty 
only when the counting process meets a point, one of whose coordinates is a 
median. In this case we employ a simple device, namely: 

Given a sample of 2n + 1 pairs, let x* and y* be the medians of the x-values 
and of the y-values, respectively. Let the pairs in which they occur be (2*, y,) 
and (xm , y*), respectively. Replace these two pairs by the sing'e pair (2m , yx). 
There are now 2». pairs and the regular method can be applied. 

The quadrant sum so obtained from an unassociated population has the same 
distribution as that formed directly from 2n pairs. 


4. Description of test (treatment of ties). The behavior of the test is known 
when (1) there is no association, (2) the probability of a tie in x-values or y-values 














TABLE 1 
NW orki ing significance levels for ——e of quadrant sums 
Shines level (Conservativ e) Magnitude of quadrant sum* 
10% 9 
5% 11 
2 13 
l | 14-15 
0.5% 15-17 
0.2% 17-19 
0.1% | 18-21 











* The smaller magnitude applies for large sample size, the larger magnitude 
for small sample size. Magnitudes equal to or greater than twice the sample 
size less six should not be used. 


is zero. The following approximation, which has an unknown effect on the 
distribution, is suggested when ties are present: 

When a tied group is reached, count the number in the tied group favorable 
to continuing and the number unfavorable. Treat the tied group as if the 
number of its points preceding the crossing of the median were 


number favorable 
1 + number unfavorable 


It seems likely that this approximation is conservative. 


5. Discussion. When a moderate number, say 25 to 200, of paired observa- 
tions on two quantities are plotted as a scatter diagram, visual examination 
frequently detects what seems to be definite evidence of association between 
the variables. Often in such cases, the usual methods for measuring associa- 
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tion do not find statistical significance of association. Visual judgment, par- 
ticularly by engineers or scientists who may wish to take action on the basis 
of their findings, gives greater weight to observations near the periphery of the 
scatter diagram. This is not always desirable—but often it is very desirable. 
A quantitative test of association with such concentration on the periphery has 
been lacking. The quadrant sum test was developed to fill the gap. Its fea- 
tures of speed and non-parametricity are useful but secondary from this point 
of view. 

When uniform attention to the whole scatter diagram is desired, the quadrant 
sum test is of unknown usefulness. We know little enough of the operating 
characteristics of the more conventional tests, such as: 

1. The product moment correlation coefficient 

2. The four-fold table formed by the medians 

3. The biserial correlation coefficient 

4. The rank correlation coefficient 
and less about the operating characteristics of the present test. In this case, 
the quadrant sum test can only be recommended definitely for exploratory 
investigations of large amounts of data. 

There are many situations, however, where we do not know where to concen- 
trate our attention, and where speed and non-parametricity are cardinal virtues 
in a test. One example is the use of serial correlation in studying industrial 
processes. We may guess that here we are interested in the periphery, but 
neither theory nor experience can, so far, prove this. In such situations the 
quadrant sum is by far the fastest to use of any of the tests known to the authors, 
and we believe one of the most useful. 


6. Elementary derivations. We can easily find the distribution of 
1. An individual term of the quadrant sum 
a. For fixed sample size 
b. In the limit 
2. The quadrant sum itself 
a. For fixed sample size 
b. In the limit, assuming asymptotic independence of the four terms. 
This we shall do now, leaving the proof that 2a actually converges to 2b to a 
later section. 


Consider a sample of 2n pairs (41, y1), --+ , (Yen, Yon) from a population in 
which x and y are independent. It is both clear and easily verifiable that 
1. The set of 2n x-values, 41, °++ , Yen 


2. The set of 2n y-values, y1, °** 5 Yon 
3. The permutation of the order of the y-values when the pairs are ordered 
by the z-values 
which together determine the sample, are independently distributed, and that 
any permutation is as likely as every other. (We have assumed no ties, which 
is a consequence, with probability one, of the continuous cumulative distribu-. 
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tions of x and y). Since the quadrant sum depends only on the permutation, 
its distribution in the absence of association does not depend on the distribu- 
tions of x and y. 

We must solve, then, certain purely combinatorial problems—under the 
hypothesis that the 2n! permutations of the y-values are all equally likely. 
It may simplify matters to assume that the values of x in the sample are 1, 2, --- , 
2n and that those of 4 are the same. How, then, do we calculate the distribu- 
tion of a single term of the quadrant sum. ~Let us begin with small z-values, 
and the pair (1, y:). If y = 1, 2,---,m, we count “one” positive, and if 
yi =n+1,n + 2,--- , 2n, we count “one” negative. We pass on to (2, y2) 
and soon. How many permutations yield a count of exactly k positive values? 
Those in which y: , y2 , ++ , yx are equal to or less than n, y+: equal to or greater 
than » + 1, and the other (2n — k — 1)y’s are arbitrary. There are: 


n(n — 1)--- (n —k+1)-(n)(Qn —k& — 1)! 


such permutations, the fraction of all (2)! permutations being: 


(1) n(n —1)++-(n—k+1)n 
(2n)(2n — 1)---(2n — k + 1)(2n — k) 


which is, then, the probability that this contribution will equal +k, or by sym- 
metry, the probability that it will equal —k, k ¥ 0. 
For large n, this becomes merely: 


(2) Pe = q tel + k ~ 0. 


In order to obtain the distribution of the quadrant sum itself, we must concern 
ourselves with the lack of independence of the four terms. This is indicated 
most clearly in the case of 2n = 2, where the 2! = 2 permutations yield 
+1+1+1+1=4and —1 —1 —1 -—1= —4. Here, there is complete lack 
of independence. We shall see later that there is effectively independence in 
the limit, so that it is worth while to calculate the sum of four independent 
terms with the limiting distribution (2) and find that it satisfies: 


9k* + 9k? + 168k + 208 


(3) Pr(| independent sum of 4 terms | >k) = 216.2" 


»k>0. 
The details will be omitted. 

A simple device, reminiscent of Wald’s [3, 1943] establishment of the two- 
dimensional tolerance limits enables us to avoid difficulties with lack of inde- 
pendence and compute the exact distribution of the quadrant sum for any n. 
We decompose the permutation of the y-values into the following parts, which 
together specify the permutation: 

(a) The number, j, of pairs in the upper right quadrant. 

(b) The set of j values of x between n + 1 and 2n corresponding to pairs in 

the upper right quadrant. 
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(c) The set of 7 values of y between n + 1 and 2n corresponding to points 
in the upper right quadrant. : 

(d) The set of 7 values of x between 1 and n corresponding to pairs in the 
lower left quadrant. (Note that the use of medians ensures that the 
lower left and upper right quadrants contain the same number of points.) 

(e) The set of j values of y between 1 and-n corresponding to pairs in the 
lower left quadrant. 

(f) The permutation of 7 objects defined by the pairs in the upper right 
quadrant. 

(g) The permutation of » — 7 objects defined by the pairs in the upper left 
quadrant. 

(h) and (i) the permutations from the remaining quadrants. 

It is easily verified that: (1) given 7, items (b) to (i) can be assigned at will, (2) 
each assignment of (a) to (i) corresponds to one and only one permutation, (3) the 
quadrant sum depends only on items (b) to (e). In fact, the right hand term 
depends on item (b), the upper term on item (c), the left hand term on item (d) 
and the lower term on item (e). While j remains fixed, the terms behave 
independently. 

For fixed j, what is the distribution of a single term? If a set of j x-values 

gives the term +4, it must contain the k largest x-values and not contain the 


next. There are: 
(" —-k—-1 
n—-j-l1 


such sets. The generating function for a single term, is, then: 


i fn—-k-1 nj [fn —k—1 
(4) > ( )o+¥( Je 
kel m—-j-l k==l j-1 


Since the terms are independent for fixed j, and there are (j!)*((n — j)!)? 
ways to supply the permutations forming items (f) to (i), the generating func- 
tion for the quadrant sum, S,, is: 


tm — ans (2—F-1)  (e— -NT 
(5) Gu(z) = 2, (Qn)! Pin +E ( nis a* |, 


The exact probability of equalling or exceeding each value of S, has been 
computed for 2n = 2, 4, 6, 8, 10, and 14. Table 2 gives these probabilities 
and Fig. 3 shows the values of 


; + logi Pr( | quadrant sum | > m) 


this particular function being chosen for its relative constancy. The maximum 
value of the quadrant sum is 4n, and for values of k less than 4n — 6, there 








TABLE 2 





Probability of a Sum of Absolute Value Equal to or Greater than k when a Sample 
of 2n is Drawn from an Unassociated Population 











1.0000} 1.0000} 1.0000) 1.0000 
0.7500) 0.9333} 0.9036) 0.9106 
0.7500} 0.7556) 0.7544) 0.7567 
0.4167| 0.6000} 0.6000; 0.6008 
0.4167) 0.4667) 0.4619] 0.4662 
0.3333) 0.3111) 0.3508) 0.3519 
0.3333) 0.2222) 0.2619) 0.2589 
0.3333] 0.1556) 0.1821) 0.1867 
0.3333] 0.1111) 0.1258) 0.1333 
0.0000} 0.1000; 0.0839) 0.0928 
0.0000) 0.1000) 0.0554) 0.0642 
0.0000} 0.1000) 0.0375) 0.0436 
0.0000) 0.1000) 0.0304/ 0.0290 
0.0000} 0.0000) 0.0286; 0.0190 
0.0000) 0:0000) 0.0286} 0.0127 
0.0000} 0.0000) 0.0286) 0.0095 
0.0000} 0.0000) 0.0286) 0.0083 
0.0000) 0.0000) 0.0000} 0.0079 
0.0000} 0.0000; 0.0000) 0.0079 
0.0000} 0.0000) 0.0000) 0.0079 
0.0000} 0.0000; 0.0000; 0.0079 


0.0000} 0.0000; 0.0000; 0.0000 
0.0000} 0.0000; 0.0000} 0.0000 
0.0000} 0.0000) 0.0000} 0.0000 
0.0000} 0.0000} 0.0000} 0.0000 
0.0000} 0.0000; 0.0000; 0.0000 
0.0000; 0.0000) 0.0000; 0.0000 








0.0000} 0.0000; 0.0000; 0.0000 


12 





\ 
™~‘, 
\ 2n 2 
IAL \ 
~ 
0 1.0000 
1 1.0000 
2 1.0000 
3 1.0000 
4 1.0000 
5 0.0000 
6 0.0000 
7 0.0000 
8 0.0000 
9 0.0000 
10 0.0000 
11 0.0000 
12 0.0000 
13 0.0000 
14 0.0000 
15 0.0000 
16 0.0000 
17 0.0000 
18 0.0000 
19 0.0000 
20 0.0000 
21 0.0000 
22 0.0000 
23 0.0000 
24 0.0000 
25 0.0000 
26 0.0000 
27 0.0000 
28 0.0000 
29 0.0000 
30 | 0.0000 
31 or over | 0.0000 
Variance 
of k 16 





24 


262 | 26% 











263¢ 





14 





1.0000 
0.9115 
0.7580 
0.6039 
0.4690 
0.3547 
0.2611 
0.1876 
0.1322 
0.0918 
0.0632 


0.0432 
0.0296 
0.0202 
0.0139 
0.0096 
0.0066 
0.0045 
0.0031 
0.0021 
0.0014 


0.0006 
0.0006 
0.0006 
0.0006 
0.0006 
0.0006 
0.0000 
0.0000 


1.000000 


0.912037 
0.754630 
0.599537 
0.462963 
0.346933 
0.252025 
0.177662 
0.121817 
0.081471 
0.053295 


0.034189 


0.021557 
0.013386 
0.008200 
0.004963 
0.002972 
0.001762 
0.001036 
0.000604 


0.000350 


0.000201 
0.000115 
0.000065 
0.000036 
0.000020 
0.000011 
0.000006 
0.000003 
0.000002 
0.000001 


0.0000; 0.000000 





24 





* Probability for 2n = «,k > 0, is given by 


9k* + 9k? + 168k + 208. 
216 - 2" 
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is quite good agreement between the curves for finite n and formula (3) at 
the practically significant percentage points. The situation for very small 
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Fig. 3. Comparative relationships for finite and infinite sample sizes and 
normal approximation to the infinite sample size 


probabilities suggests a careful consideration of the limiting behavior of the 
quadrant sum distribution (see section 10). 
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The device for samples of 2n + 1 deserves a word of justification. If there 
is no association, the 2n + 1 y-values are randomly paired with the 2n + 1 
x-values, and, in particular, the y-value paired with the 2-median is randomly 
selected. If we pair it with the (randomly selected) x-value which was paired 
with the y-median we still have random pairing. The pairing of the 2n pairs 
is random, although neither the x-values nor the y-values make up a sample. 
The randomness of pairing is all that has been used in the discussions of this 
section. 


7. Extension to higher dimensions. The same ideas that underlie the quad- 
rant sum test for two variables may be extended in several ways to give tests 
for various types of association among three or more variables. Only one 
three-variable case will be discussed here, leaving further extension to the 
reader. 

Given three variables, x, y, and z, and a sample of matched observations on 
these, it is clearly possible to use the simple quadrant sum test for two variables 
to investigate association between x and y separately, between y and z separately, 
and between z and x separately. If the Pearson coefficient of correlation were 
being computed and were found to be close to zero for each of these pairs, it 
would be assumed that there was no detectable association through the second 
moments. In a trivariate normal or Gaussian distribution, where the first and 
second moments determine the whole distribution, if there is independence be- 
tween the separate pairs of variables, there is no possibility of a three-way 
association. It is of some interest, however, to notice that a corner sum test 
can be devised that will measure the effect of such triple association in case it 
does exist. 

Consider the octants into which the three median planes for z, y, and z, 
respectively, divide the three dimensional scatter diagram and label the octants 
alternately plus and minus, in the manner suggested by Fig. 4. More precisely, 
an octant is counted as plus if an odd number, that is three or one, of the vari- 
ables are greater than the medians of the sample, and the remaining octants are 
labelled minus. It is clear that we may repeat the process of coming in along 
each axis passing from observation to observation as long as they remain in a 
region of fixed sign, and writing down as a contribution to the final or octant 
sum the number of such consecutive elements and the sign of the region in which 
they were found. There will be six terms rather than four, as was the case 
for the test based on quadrants, and so a new set of significance levels will be 
required. Table 3, following, lists the situation for a very large sample. 

The situation has been sketched for the case of 2n triples. If there are 2n + 1 
triples, then we may have trouble with the medians again. However, a similar 
device works, except that we must agree on a last variable in order to form the 
synthetic triples uniquely. For example, consider the triples (m, 3, 5), (9, m, 1), 
(12, 4, m), where m denotes the median. Taking the order in which the vari- 
ables are written, we get (12, 3, 5) and (9, 4, 1) as the synthetic triples. Other 








Fic. 4. Octant schematic—solid sections taken as positive 


TABLE 3 
Working significance levels for the magnitudes of the octant sum 














Significance Level | Magnitude of Octant Sum* 
10% 11 
5% 13 
2% 15 
1% 16 
0.5% 18 
0.2% 20 
0.1% 21 





* Computed for large samples only and based on normal approximation, see 
section 11 for discussion of this and higher dimensional cases. 
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orders would yield (9, 3, 5) and (12, 4, 1) or (9, 3, 1) and (12,4, 5). This slight 
dissymmetry is not pleasing but should give no difficulty. 


8. Nongraphical example. The following example of 78 successive observa- 
tions of four variables shows how this test may be applied without plotting and 
how simple the computation still remains. The data concern a metallurgical 
































TABLE 4 
Excerpt from Tippett’s Table 

Time T* Fuel F* | Material M* | Articles A* | Duration D* 
1 - | 246 + 1457 — | 1895 + | 168.5 + 
2- 196 — 2078 + | 2121 + | 152 + 
3- 192 — 1278 — | 1437 — 153 + 
4—- 202 + 1398 — 1497 — 145 _ 
5 - 206 + 1944 + 1592 + 153 + 
6 — 218 + 144644 — | 1506 — | 147.5 - 
i- 155 — 1541 + | 1762 + 152 + 
8 — ; 201 + 1502 + | 1818 + | 144.5 - 
9 — | 21+ 1950 + | 1144 —- 151.5 + 
10 — 236 + 1768 + 1654 + 151.5 + 

etc. to | 
78 + | 185 — 1536 + | 1442 - | 152 + 
Median | Median | Median Median | Median 

39.5 | 199 1474 | 1588 | 149.5 
* Location of observation relative to column median; + = above; — = below. 


Tippett’s correlations (based on lightly rounded data) 
trum = + 0.243 
Yrs = a 0.266 
Tua = + 0.681 
Teu.a = + 0.088 
rema. = + 0.141. 


problem in mass production and are taken from L. H. C. Tippett, Table XXII, 
page 63 [2]. An excerpt from the data is given in Table 4 together with Tip- 
pett’s calculated correlations. This table also shows the preliminary marking 
of each individual measurement as above (+) for its variable, below (—), or 
on the median (0). From this table we see, for example, that increasing 7’ con- 
tributes a term —3 to the quadrant sum for JT and D. It is often desirable to 
prepare auxiliary tables to assist in computing the components of the quadrant 
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and hyperquadrant sums. Such a table is Table 5 for low values of Fuel (F—) 
arranged in consecutive ascending numerical order. The entries on this table 
for the five columns headed F, 7, M, A, and D are directly comparable to the 
entries in Table 4. For example, F = 155 is — with respect to the fuel median 
and T = 7, —; M = 1541, +; A = 1762, +; D = 152,+. The double, triple, 
quadruple and quintuple headed columns contain simply the algebraic multi- 
plication of the signs in the appropriate 7, M, A, or D columns. Thus, 7M 
for F = 155 is —, MAD is +, and TMAD is —. The contribution to each 
quadrant or hyperquadrant sum is simply the count of the consecutive like 
signs from the top of a column. For column AD, we have 7 consecutive + 
signs and since the contribution is to FAD and F is —, the contribution in this 
case to the octant sum is —7. The results from the ten tables of which Table 5 


TABLE 5 


Sample Table for One Component of Quadrant and Hyperquadrant Sums. Low 
Values of Fuel (F—) 




















| | 2 [7m | m4 | 7 wa] up| aD | nwa |rwp| rap wap|rwap 
———— —j— I} || || — | —} —| — 
98 — }+ | — -|-|-|-|-/|+/+]/+]+ /+ /+]- | - 
135 — Pepe fede ett lee ei ee 1S be | 
140 — bop) “POTS ett Sei Le im i oe 
146 — }-|-|-|-|+]4+/4+/4+]/4+/4+/- |- ]- ]- |] 4 
147 — eit le leet ele le le tie fm fe re 1 >} 
149 — CE ee eee err re te ie Te | 
Te tT tT Te te Tt | 
151 — (oa Good Weed (east leat Seat Gent Saal a Saad (call all ale Gell 
153 — PEPE Hee ee Le iaK i e 1 e Ys 16} 
155 — }-|+]}4+)/4+/-]-j)-/4+]4+]4})- |/- |- |4+) - 


Contributions to Sums 
FT FM FA FD FTM FTA FTD FMA FMD FAD FTMA FTMD FTAD FMAD FTMAD 
—2°4+4 47 +8 +2 42 42 -4 -4 -7 -2 -2 -2 +44 42 


is a sample are then carried to the summary computation shown in Table 6. 
The contribution from Table 5 is shown on line F—. The totals are computed 
and their probabilities of occurrence determined. 


9. Serial example. The following example, a sample of 144 observations of 
the thickness of inlay for relay springs cut consecutively from a single sheet of 
material, allows us to compare the resolution of the present test with that of 
the serial product-moment correlation. The data are from Shewhart [1, 1941, 
Table 1] and the serial correlations from lag 1 to lag 22 are from recent calcu- 
lations by Miss Dorothy T. Angell. The procedure for calculating the serial 
quadrant sums is similar to that for obtaining the sums for section 8. A table 
is prepared to show the observed comg§pecutive order of the numerical values and 
each is identified as above (+), below (—), or on the median (0). This gives a 
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table similar to one of the elements, say Fuel, in Table 4. Four computation 
tables similar to Table 5 are required, one for the equivalent of moving from 
the right, one from below, one from the left, and one from the top of a lag cor- 
relation scatter diagram. One table from each direction will take care of all 
lags. In the first, the marginal entries are the observed values listed in descend- 
ing numerical order. Opposite these are recorded from the previous table the 
signs associated with observations for each lag with respect to each entry. 
The second table would record the signs relating to the lags from the observed 
values arranged in ascending order. The third table would record the signs 
relating to leads from the observed values arranged in ascending order and the 
fourth, the signs relating to leads from the observed values arranged in descend- 
ing order. The sign of the contribution from each group is the algebraic product 
of the sign of the run and the sign of the marginal entries. The length of run 
is determined in the same way as in Table 5. Table 7 illustrates the procedure 


TABLE 7 
Relation of Lagged Observations to Median (+ = above, — = below) for Smallest Observations in Ascending Order 
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* Contribution to Serial Quadrant Sum . 


of determining the contribution from lags associated with the observations 
arranged in ascending order. 

Two serial quadrant sums may be computed—a circular serial quadrant sum 
or a noncircular serial quadrant sum. Circular items arise from considering 
that the beginning of the set of observations is a continuation of the end in the 
same way that this assumption is made in computing circular serial correlation 
coefficients. In Table 7, circular items are shown in parentheses and are omitted 
in calculating noncircular sums. In the particular table shown, the count of 
the run lengths was identical for both types of sum, but in other cases this may 
not be the case. Since the serial quadrant sum is relatively insensitive to 
sample size, the noncircular serial quadrant sum has for all practical purposes 
the same distribution as the circular quadrant sum. The correspondence in 
this case between the serial correlation coefficient for each lag up to 22 and 
the respective values of the two types of serial quadrant sums is shown in Fig. 5. 
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. CIRCULAR 
QUADRANT SUMS 


CIRCULAR PRODUCT- 
MOMENT CORRELATIONS 


NONCIRGULAR 
QUADRANT SUMS 





LAG 


Fic. 5. Comparative performance on a serial (autocorrelative) example 


10. Convergence to the limiting distribution. We shall consider several 
chance sums. One of these is S, which has the limiting distribution discussed 
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in section 6. Another is S; , which is the sum of four independent terms, each 
distributed according to the limiting distribution curtailed at +k. Its generat- 
ing function is 


k . - k . . 4 
G(x) in (>: 9g G+D x 4. Zz go +n “) ; 
i=l i=l 
The total probability assigned to S; = —k, —(k — 1), --- , k, is less than unity, 
so that there is nonzero probability that S; is not defined. The third is S, , 
the quadrant sum itself, whose generating function is (5), and the fourth is the 


result of the same sort of curtailment applied to S,. It will be denoted by 
S,,x and its generating function is 


i _ . citwd k a daa 4 
Gate) = DOP ee (Tbe), 
‘ j (2n)! i=1\n —j -1 inl j-1 


This again corresponds to a total probability less than unity. 
It is clear that 


Pr(S,., = m) < Pr(S, = m) 
and 

Pr(S; = m) < Pr(S = m). 
We shall soon show that 


(6) lim Pr(S,,, = m) = Pr(S; = m) 


and this will imply that 
lim Pr(S, = m) = Pr(S = m) 


no 


which is the desired result. The implication runs as follows: given e, we can 
choose k so large that 


Pr(S;, defined] > 1 — €/3 
whence 
| Pr(S; = m) — Pr(S = m) | < «/3 
and then choose 7 so large that 
| Pr(Sn.z = m) — Pr(S, = m) | < €/(24k + 6) 
form = —4k, —4k + 1,---,4k 
whence 


Sk+1 2, 16k +3 


(Sp fj ie as i ammonia 
Pr(Sn,, defined) > 1 — ¢«/3 dk + 6° = Dak + 6S 
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and hence 
< 16k + 2. 
—_ S or +e + 6 © 


this inequality holding automatically for |m|> 4k. Hence, 
| Pr(S, = m) — Pr(S = m) | 
< | Pr(S, = m) — Pr(S,.. = m) | + | Pr(S,. = m) — Pr(S; = m) | 


| Pr(Sax = m) — Pr(S, = m) | 





it < 16k +2 ] 
| P Si = 7, — Pr S= — 1 
+ | Pr(S, = m) r( m) | S ae “7, é + a ¥ ge + 4 ce 
This method is clearly of general application in such problems. 


We turn now to the proof of (6). The expression for G,,,(z) shows that"we 
may consider it the result of the following process: the integer j is a chance 
quantity with the distribution 


The first of these relations shows that j/n converges stochastically to } as n 
approaches infinity. The second shows, since 


Col 
n=-j-1 Ba~-t- ite pi i — IG ~ ))--- G -t +f 





(") (n —j7 — 1)\j — in! n(n — 1)(n — a. --(n — 2) 








(") oa oe 
; 


_ QAM —j-1+ Mj tt Vi 


n(n — 1)- —— — 1) 


(i+1) 





and both of these converge stochastically to 27 as n approaches infinity, 
that G,,x,;(x) converges stochastically to G(x). Since these curtailed generat- 
ing functions involve only powers of in the finite range between —4k and +-4k, 
the limiting relation (6) follows at once. 


11. Effectiveness of normal approximation. Fig. 3 shows the relation be- 
tween the asymptotic distribution of the quadrant sum for large n and a normal 
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distribution with variance 24, i.e., the same variance as that of the asymptotic 
distribution. The normal approximation is calculated from 


ehen ta 

Pr(| S,| > m) Pr (2 =. we) 

where x is normally distributed with zero mean and unit variance. The asymp- 
totic and normal curves agree surprisingly well out to the 5% point, and an error 
of a full unit in the significance level first occurs beyond the 0.5% point. 

Since the asymptotic distributions for the quadrant, octant, hexadecant, do- 
triacontant,—, sums become more and more normal, the normal approximation 
will be even better for higher dimensions. In r dimensions, this approximation 
consists in treating 

| Sil — 3 
V 12r 
as the absolute value of a standard deviate. This should be quite adequate for 
large samples and r > 4. 


12. Unsolved problems. The central unsolved problem in connection with 
the quadrant sum is: 


(1) What is the operating characteristic? 
This has as a corollary the more general question: 
(2) How can the operating characteristic of a nonparametric test be de- 
scribed so as to be useful to the users of the test? 
There are, of course, minor problems which are much more easily soluble. A 
few, listed in order of practical importance, are: 
(3) What is the effect on the significance levels of the use of lagged values 
of x as values of y? 
(4) What are the exact distributions for moderate n in three or more dimen- 
sions? 
(5) Do the analogous limiting distributions hold for three or more dimen- 
sions? 


(6) What is a better approximation to the limiting distribution for moderate 
n? 


To encourage others to solve some of these, we close with the assurance that 
they have our good wishes. 
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DISCRIMINANT FUNCTIONS 


By Grorce W. Brown 


Iowa State College 


1. Introduction: In the following sections the development of discriminant 
function techniques is approached from an elementary point of view, considering 
first an essentially trivial problem, then working up to the more complex situa- 
tions which may be handled by discriminant function methods. No attempt 
has been made to follow the pattern of the historical development in this process, 
and no consistent attempt has been made to allocate proper credit, in the text, 
to those individuals responsible for the introduction and exploitation of these 
methods. A more or less exhaustive bibliography of discriminant function 
applications and related theory is given at the end of this paper. 

Some historical perspective may be gained, however, from a very sketchy 
consideration of the early background of the subject. The first published 
application of the discriminant function seems to have been the work of Barnard 
(1935 [1]) on craniometry, following the suggestion of R. A. Fisher. Meanwhile 
P. C. Mahalanobis (1927, [30]; 1930, [31]) and, in this country, Hotelling (1931, 
[25]) had been concerned with a closely related problem, the construction of 
measures of the ‘‘distance’”’ between two sets of multiple measurements, for which 
Karl Pearson’s (1926, [34]) coefficient of racial likeness was not wholly adequate. 
Fisher (1936, [18]) gave a further example of the method and showed (1938, [19]) 
the relation hetween his work and that of Hotelling (1931, [25]; 1936, [27]). Thus 
the theory of discriminant function analysis proper is about ten years old, but is 
intimately related to researches which go back a few more years. 

A simple problem: Consider the very simple case of a single measurement, say 
£, which may be made in each of two populations, and let us suppose, for the 
sake of discussion, that £ is normally distributed, with unit variance, in each 
population, but with possibly different means in the two populations. 

Let 


E\(é) =a—B 
_~=#£g) =at+B 


be the mean values of & over the two populations, with 8 > 0. As an example, 
we may consider the pH measurements of Iowa soil samples (Cox and Martin, 
[12]), for two soil populations, distinguished by the presence or absence of Azoto- 
bacter. From 100 samples containing Azotobacter and 186 samples containing 
no Azotobacter, we have the estimated averages of pH equal to 7.423 and 6.015 
respectively, with an estimated standard error of .625 within populations (see 
Fig. 1). 
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& = 6.719 
B= .704 
¢= 625 
B/e = 1.18. 


Let us suppose further that ~ is the only measurement available on a single 
individual, not knowing to which of populations 1 and 2 the individual belongs. 


Distribution of PH Measurements 


With Azotobacter Without Azotobacter 


6015 6719 7423 
Fic. 1 


The problem is to classify this individual as a member of population 1 or popula- 
tion 2. It is clear that ¢ furnishes the only information on which to base a 
decision, and that essentially the only procedure available is to choose a number, 
say & , such that we choose population 1 when é < & and population 2 when 
&> &. Furthermore, it is evident that the expected accuracy of classification 
depends on the size of 8. If we wish to have equal risks of misclassification for 
members of the two populations we choose & = a. Then the probability of 
misclassification is given by P{e > 8}, where ¢ is a normal deviate with unit 
variance. As one would expect, the probability of misclassification tends to 0 as 
8 — o and tends to4as8-—0. In the Azotobacter example, if we assume 
that the estimates given are the population values, we choose & = 6.719. The 
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ratio 3/¢ = 1.13 is exceeded approximately 13% of the time in sampling from 
the normal distribution, leading to .13 as the probability of misclassification. 

Consider now the slightly more general situation in which we consider a fixed 
variate, say w with measurements é distributed, for fixed w, with a mean of the 
form a + 6w. This is the standard regression situation. As before assume 
that ~ is normally distributed about this mean with unit variance, that is 


E=a+t Bute 


where a and @ are constants, w may take on any or all real values, and € is a 
normal deviate. Note that if w is restricted to take on only two values the 
structure reduces to the first structure considered. An example of the continu- 
ous type might be constructed by considering w as genotypic yield of grain and & 
a phenotypic measure of yield (Smith, [86)). 

The simple problem formulated for the two-population case may be reformu- 
lated here as follows: Given the relationship £ = a + Bw + e, and given é for 
an individual for which no other information is known, how shall we estimate w? 
For selective breeding the problem may be to select individuals for which w is 
at one end of the scale, rather than to estimate w itself. Whatever decision is 
to be made, it is still clear that & furnishes the only available information, and 
that the certainty of the decision is a function of 8. Since (§ — a)/B = w+ €/£, 
the variance of this estimate of w is 1/8”. Note that confidence intervals for w, 
given £, may be constructed from the normally distributed quantity § — a — Bw. 

It should be pointed out that in the usual regression case we are interested in 
predicting é for given w, with the hypothesis as stated above, whereas in this 
case & will be observed, and the problem is that of estimating, as a parameter of 
the distribution of £, the fixed variate w. 

Obviously 8 must not, vanish if € is to perform any discrimination among w 
values. In practice, of course, a and 6 will not be given as known values and the 
variance of ¢ will not be known, but a finite set of observations may be available, 
for which w values are known and é has been observed. The usual analysis of 
variance provides a significance test for the non-vanishing of 8, which is equiv- 
alent to testing for the significance of the regression of & on w. 

It is to be noted that this analysis reduces to the conventional between-within 
analysis (/ or t-test) when we have the special case of two populations. More- 
over, if we had treated é as the fixed variate instead of w, and considered the re- 
gression of w on é, the Analysis of Variance would have differed only in replacing 
>(é — £)* throughout by 2(w — @)? and the relevant F-t2st would have been un- 
changed. 

When probabilities of misclassification are estimated from finite samples, as 
in the soil classification example, there are three sources of error, sampling error 
in the estimate of the separation value £ , sampling error in the estimate of the 
distance between the population means, and sampling error in the estimated 
standard deviation of within populations. It does not appear difficult to set 
up confidence intervals for the probability of misclassification, assuming repeated 
classification of individuals given fixed initial samples. 
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2. The one-dimensional discriminant function. We have been dealing so far 
with the simple situation in which only one measurement per individual is 
available for purposes of discrin ination. Suppose we still have this measure- 


ment, call it & , now, but we have other measurements as well, say & , --- , Ep. 
As before &: = a + Bw + «. For the moment suppose that the remaining 
measurements have mean values independent of w, so that 

Em = Om + Em, (m = 2,---,p), 
and let us assume also that the {e,,} are mutually independent, (m = 1,2, --+ , p) 
and are normal deviates with unit variance. It is safe to assume that nobody 
would ever argue, in this case, that the measurements &,--- , &, provide 


information about the w value for an individual. If, then, we were so fortunate 
that we were in this situation, and knew so, we could say that & is our dis- 
criminant function, since, if any discriminating is to be done, é, has to do it. 





TABLE 1 
Analysis of Variance for Regression 
at. Sums of Squares 
Regression 1 rZ(—E — £) 
Error N-3 (1 — r*)Z(— — &)? 
Total Ni | z(é — £)? 
Z(é — §)(w — DB) 


~ VSE— 8)? 2(w — By)? 


Suppose, now that the measurements &, , &, +--+ , &) are not explicitly avail- 
able, but that we are able to observe a linearly equivalent set 2 , 22, --- 
related to the {¢,,} by the transformation 


Pp 
Lm = } ban Sn 


n=l 


» Up, 


where the J, are unknown. For fixed w, tm has expected value 


p 

7 Len title + Imi BW = Am + Dm W, 

n=1 
so that in general each x» observation provides information about w. More- 
over, the x» are not in general mutually independent; it is evident that the 
population matrix of variances and covariances for fixed w is given by om. = 


Pp 
pe 
k=l 


As an example of a set of correlated measurements, consider the Azotobacter 
example referred to above. In addition to pH values, determinations of avail- 
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able phosphate content and total nitrogen content were made on soil samples 
in each of the two populations. Means were as follows: 


pH Phosphate Nitrogen 


Mean of 100 samples with Azotobacter 7.423 133.120 29.400 
Mean of 186 samples without ™ 6.015 51.113 21.140 
Mean difference 1.408 82.007 8.260 


Clearly the differences are proportional to the hypothetical b,,’s. The variance- 
covariance matrix, estimated from the 284 degrees of freedom within populations, 
is given by Table 2. 











TABLE 2 
pH | Phosphate Nitrogen 
pH | 111.0879 | 2,292.7192 198. 4026 
ee ee | 
284(¢mn) = Phosphate | 1,042,799.1890 | 5,066.2645 


| 29,422.3655 


Nitrogen 


Estimated correlation coefficients within populations are not large, .213 for pH 
and Phosphate, .110 for pH and Nitrogen, and .029 for Phosphate and Nitrogen. 

Another example is furnished by Fisher’s Iris measurements [8], provid- 
ing sepal length, sepal width, petal length, and petal width for each of 50 
individuals of Iris setosa and 50 individuals of Iris versicolor. This example is 
an unfortunate one in that either petal length or petal width alone is sufficient 
to discriminate the two populations as completely as anybody has a right to 
expect anytime. The petal lengths, for example, vary between 1.0 and 1.9 cm. 
for the 50 setosa, and between 3.0 and 5.1 cm. for the 50 versicolor. 

Let us proceed, under the assumption that available measurements, 2m, 
are distributed normally about mean values a, + b,,w, with variance covari- 


ance matrix omn for fixed w, keeping in mind the underlying model of & , & , --- , 
é,, with 

P. . 
tm = Qo bmn En &§ =a + pBwt+a; f& = ag + @;°°+; bp = ap + €. 


The skeptic may wish to grant the first part of our assumptions without grant- 
ing the hypothetical structure of ~’s underlying the x’s. Hotelling’s work [27] 
shows that such an underlying structure of ~’s may always be provided, given 
the distribution of x’s for fixed w. In other words, a distribution of z’s for fixed 
w leads essentially uniquely to an underlying ¢ model. 

The discriminant function, given omn,@m and b, , for m,n, = 1, 2,--:, p, 
is 





x. 


= 
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Pp Pp 
X= DS o™bntn = D> tan 
n=l 


m,n=l 


where 


Pp 
ti, = 7 o"bm, and o”” 
m=1 
is the reciprocal matrix to om,. That is o”” are the solutions of the linear sys- 
tems [17] 


oon =O if m=n 
z=1 


} m,n, = 1,2,---,p 


p 
Dy oom = 1; m=1,°:-,p. 


8=1 
That X, as defined above, is properly called the discriminant function will be- 


Pp 
come evident immediately. Putting bm = m8, tn = >.> lnate , we have 
k=1 


x= B > o” Int lnk Ex « 


m,n,k, 
Recalling that the o”” are reciprocal to omn = 2» lmslax, it can be seen that 
k 
7 o” "lmilnk = 1 if k = 1, and vanishes fork = 1. It follows that 


X= 6h, 


in other words, X calculated as >> o”"bnt, from known population quantities 


mn 


is proportional to the hypothetical £ , the only one of the underlying measure- 
ments which is related to w, thus justifying the term discriminant function for 
X. Itisclear that any other linear function of the z’s is also a linear function of 
the ¢’s, and can discriminate, at best, only as well as X itself, since all the é’s 
are independent of w, with the exception of &. X itself discriminates w to the 
same extent that £ , were it available, would discriminate. 

The degree of discrimination of w’s depends, as indicated in the previous sec- 
tion, on the ratio of the mean square of & , among w’s (mean square for regres- 
sion), to the mean square of & for fixed w (mean square for error). Since X 
is proportional to & , the same is true when X is substituted for &. It turns 
out, of course, that X is that linear combination of x’s for which the ratio of the 
mean square for regression to the mean square for error is a maximum, or, what 
is the same thing, X is that linear combination of X’s which has the maximum 
correlation with w. From any point of view X appears to be the logical function 
of x’s to compute. It is clear that \X is precisely as good as X, if \ is any con- 
stant. 
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In the two population case, where w takes on only two values, X is evidently 
proportional to Lo" "(imi — Mm2)2n, Where fm; ANd m2 are the mean values of 
Zm in the two populations. X is here the particular linear combination of z’s 
for which the ratio of the mean square between populations to the mean square 
within populations is a maximum. The value of this ratio, which measures the 
degree of discrimination possible, depends on the spread of the means of X 
between the populations, or in general, on the spread of the means of X over some 
given distribution of w’s. Given on, and b, the larger the spread of w values 

‘the better overall discrimination will be obtainable. On the other hand, the 
coefficients for X depend only on om, and by. 

Since X is proportional to £ , it follows that the discriminant function is in- 
variant under non-singular linear transformation of the x’s, that is, if some set of 
y’s, linearly dependent on the x’s, had been observed, together with their means, 
variances and covariances, the discriminant values would not have changed. 
This invariance is obviously a desirable property, and as such was one of the 
goals of Fisher, Hotelling, and Mahalanobis. One more property of the dis- 
criminant function is of interest; X is essentially equivalent to the maximum 
likelihood estimate of w. 

In our statistical model w plays the role of a fixed variate or population param- 
eter, and the x’s have a joint distribution about linear functions of w as means. 
Suppose now that (omn) and {b,} are estimated from an analysis of variance 
and covariance on data for which w as well as x values are known. The problem 
of estimating w for a single individual whose x measurements are given resolves 
into a two-stage estimation process, the first stage being the estimation of 
(omn) and {b,} from the initial data, the second stage being the estimation of w 
by the discriminant function whose coefficients are computed from the es- 
timated (omn) and {bn}. It has already been pointed out that X is the linear 
combination of «’s which has greatest correlation with w. It turns out, then, 
that the coefficients of X are proportional to those which would have been ob- 
tained from a formal regression analysis of w on 2 , %2,°-* , X», considering the 
x’s as independent variables and w as dependent variable, a direct interchange of 
roles as compared with the statistical model we have assumed. Of course two 
linear functions differing only by a factor of proportionality are equivalent in 
discrimination. If the formal analysis of variance is carried out for testing the 
significance of the regression of w on 2, %2,°-: , Xp, the relevant F ratio re- 
mains a valid test for the non-vanishing of the 6, in spite of the inversion of 
dependent and independent variables. The analysis of variance is given in 
Table 3. 

R is, of course, the conventional multiple correlation coefficient. An equiva- 
lent analysis can be carried out for X itself, allowing sufficient degrees of freedom 
for the estimation of the constants in X, as given in Table 4. 

This analysis is proportional to the analysis given above. It might be noted 
that the mean square corresponding to error sum of squares in this analysis is 
Lo" baba, Which is X evaluated for xz, = ba, (n = 1,2, --+ ,p). 
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In the Azotobacter example, Cox and Martin arrive at a discriminant function 
which has the analysis given in Table 5. 

It is evident that the difference between populations is highly significant. The 
choice of scale for X in this case forces the sum of squares within populations to 
be equal to the difference between the mean X values for the two populations. 
Thus the mean X differs by .021777 for the two populations, and has an esti- 





























TABLE 3 
Analysis of Variance for Regression 
i” | d.f. Sums of Squares 
: |—_——____|____— 
Regression | p | R2X(w — 0)? 
Error | N-p-1l | (1 — R2)Z(w — B)? 
ennai ——.--- SS 
Total | N-1 | Z(w — w)? 
TABLE 4 
ae Analysis of Variance for X on w 
d.f. | Sums of Squares 
Regression p | R(X — X)? 7 
Error N-p-1| (1 — R2)2(X — X)? 
Total | N-1 | =(X — X) 
TABLE 5 
Analysis of Variance of Discriminant Function 
d.f. | Sums of Squares Mean Square 
Between populations | 3 | .030842 .01028 
Within populations | 282 | 02177 .00007722 
ee 
Total | 285 | 





mated standard error, within populations, equal to +/.00007722 = .008788. 
Half the difference, divided by the standard error is the normal deviate cor- 
responding to misclassification, if equal risks are taken. In this case the value 
of the normal deviate is 1.24, approximately, leading to an estimated probability 
of misclassification of about .11, which is not very much better than the .13 
which one would have obtained if pH alone had been used. 

In this problem, as in conventional regression analysis, it is tempting, for 
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various reasons, to consider the possibility of using smaller sets of classifying 
measurements. Moreover, a significance test for this situation is in general 
more interesting, as a practical matter, than the significance test for differences 
among populations, since the initial presumption is that we are interested in 






































being able to discriminate, on the basis of 21, x2, ---, 2». Suppose, for ex- 
ample, we wish to test whether the discriminant function X,,) based on 2, 
%2,°** , Xp is significantly better than the discriminant function X,,) based on 
%1,°*:,2,, with r < p. The relevant test is precisely the same as the test 
TABLE 6 
Analysis of Variance for Rejecting tr41,°-* , Lp 
| Sums of Squares | d.f. 
| ~ | 
S; Regression on | ti, °°* , Qe | r 
Si, Regression on | Ben *** 5 Sep Beets *** » Be p 
S, — S? Difference | p-—r 
7 — Sp Error | =ees 
eS ——-. tl he ; _ 
Sr Total | N-1 
TABLE 7 
Analysis of Variance for X = Xo 
Sums of Squares | d.f. 
Si Regression on Xo | 1 
Ss . Regression on 21, °°: , Xp p 
S, — Si | Difference | p—1 
7 — Sp Error | N-p-1l 
| 
> Total | N-1 
calculated formally from the regression of w on the sets %,,---, x, and 1%, 
%2,°** , X», With the analysis of variance given in Table 6. 


Similarly, if we wish to test for the significance of a theoretical discriminant 
function, Xo, with preassigned coefficients, as compared with X,, we have 
again the conventional test calculated from the formal analysis of the regression 
of w on 2%, %2,°** , », as given in Table 7. 

As shown by Fisher [21] the relevant F-Test for this hypothesis is computable 
as 
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where R” = R’(1 — 1°’), ris the correlation between X and X, for fixed w, and 
R is the multiple correlation for w on 2 , --- , Zp , or, What is the same thing, the 
correlation of w and X. 

The example of Smith [36] is an example in which the relationships of ’s 
to w have to be estimated from analysis of variance and covariance of data in 
which the w’s are not really known, being related to genotypes. The regression 
of x’s on w is estimated by a generalization of the components-of-variance 
method, from variance-covariance analyses in which the usual null hypotheses 
are significantly contradicted. The net effect is that the usual significance 
tests now fail to hold, although the algebraic calculations are formally equivalent 
to those given above, once the population relations of x’s to w are established. 
When work of this kind is based on small samples, there is some difficulty in 
estimating the reliability of the results. 


3. Multi-dimensional discriminant functions. Instead of trying to discrimi- 
nate between two populations or estimate a single parameter w, our problem may 
be to discriminate among several populations, not necessarily linearly related, 
or to estimate many independent parameters w,, w2, ++: , Ws. Just asa single 
parameter w is sufficient to distinguish between means of measurements for two 
different populations, s parameters are sufficient to distinguish between means 
of s + 1 different populations, and exactly s parameters will be required, if 
no linear relation obtains among the s + 1 populations. For example, with 
three populations, any measurement mean may be given the three possible 
values a, a + B, a + ¥, corresponding to w; = w. = 0 for population 1, w, = 1, 
w. = 0 for population 2, and w,; = 0, w. = 1 for population 3. Geometrically 
we have to consider a set of parameter values as a point in an s-dimensional 
space. 

The one-dimensional discriminant function admits two very different general- 
izations in higher dimensions. The practical solution to a particular problem 
for which s is moderately large may involve a mixture of both generalizations. 

Let us generalize our statistical model before discussing the discrimination 
problem. To avoid complication of algebraic notation, let us for the moment 
assume s = 2. We will now postulate a set of hypothetical measurements 
&, &,°°'; Ep , with 


&=a+ But+ywt+ a 
fo = ao + Bou + you + & 
£3 = as + €3 


Ep = Ap + Ep, 
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where the e, are independent normal deviates with unit variance, u and v are 
fixed variates or parameters corresponding to the different populations, and 
G1, Q2,°**, &p, Bi, Be, 1, and y2 are constants. Evidently &,--- , & gan 
yield no information about wu and v; & and & together contain all the information 
there is to get about wand v. As before, assume that our data will be in the form 
of linear combinations tm = Ulmnaén, With unknown coefficients lm,z. The 
variance-covariance matrix within populations, or for fixed u, v, is still given by 
Omn = Xmkla, . The mean values of the x’s for fixed u, are given by 


E(tm) = Zlmno@n. + (L281 + I m22)U + (lmryi + Lmaty2)¥ 
= Am + DOnu + Cy. 


This model is again justifiable on he basis of Hotelling’s work. 

The first question to ask is whether we can now form two linear combinations 
of the x’s and get rid of &,--- , £, in both, thus providing a two dimensional 
description of an individual on the basis of 2, t2, +++, 2%). The answer here 
is in the affirmative, as a result of a direct generalization of the method dis- 
cussed earlier. If we calculate X1 = Yo" "bmx, and Xz = Lo" "CmXn, WE are 
fortunate enough to get 


Xi = Bit: + Bobs 
Xe = viki + Yoke 


with no disturbing elements from &,--- ,&». Assuming for now.that X, and 
X2 are not merely proportional, i.e. Bry2 — Bxy1~% 0, what do we do with X, and 
X2? 

For fixed u, v, we have 


E(X1) = To" "bmndn + ULC” Dmban + VIO” "DmCn 
dp + Bee + Co 
E(X2) = To" "C mn + UTO”Cmban + VIO” "CmCn 


= Ao te Buu a Cw 
and variances and covariance 


Tz 


N 
& 


Zo” bale = C; = Be 
fa = Lo" Cal, = Cy. 

We may for example, estimate wu and v by solving the equations 
But+tew = Xi— At 
Bou + Cw = X2 — Ao, 
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or we may set up regions in the X; , X2 plane for which certain decisions are 
made. For example, when classifying an individual into one of three popula- 
tions, we might delineate regions, as in Fig. 2. 

Then the particular individual would be classified as coming from population I, 
II, or III, according to which region X,, Xe falls in. The individual points 
shown in the figure represent the expected values of X1, X2 for each of the three 
populations. No exhaustive investigation has been made for this situation, but 
some fairly obvious methods are available for constructing such regions. 

With respect to significance tests when the omn ,@m , bm , Cm are estimated from 
samples, the whole gamut of multivariate analysis has to be run. Tests ana- 
logous to (but more complicated than) F tests exist for testing the significance 


Classification Regions in X;,X2z Plane 


Ont 


oF 


Fic. 2 


of the discrimination, the significance of a subset of the x’s, and the significance 
of a theoretical pair X1,9 , X20 (Wilks [41], [42], [43]). 

For some purposes a two-dimensional discrimant function X,, X_2 may be 
unsatisfactory. For example, we might suspect that Byy2 = Bey; (or that the 
relationship is nearly satisfied). Under these circumstances X, is (nearly) 
proportional to X2, and we would like to compute the best one-dimensional 
discriminant function, even though we have started with two linear parameters 
uandv. Evenif Bry. ~ Boy: we might still ask for the best one-dimensional dis- 
criminant function, in order to rank our populations on the ‘‘best’”’ linear scale. 
If we define Y as that linear combination of x , x2 , «++ , X» which has the largest 
multiple correlation with wu and v, we have generalized the simple one-dimen- 
sional discriminant function in a second direction. 

Before proceeding, it is useful to recognize that Y, as defined above, must be a 
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function of X; , X2, since X; and X, together contain all the information about 
u and v that can be obtained from the z’s. 


Now suppose we consider an arbitrary linear combination Y = \:X1 + A2.Xe. 
Y correlates best with 


Ai(tuu + Ti) + Ao(Tiew + Tov) = (Ait + AeT2)U + (AoTI2 + AoT22)v. 


We now have to choose \,; and X» to maximize this correlation. This correla- 
tion will be maximized if we maximize the ratio of the variance of 


(Ati + AeTn)U + Qutie + AeT22)V 


(over the distribution of wu and v values) to the variance of Y for fixed wu and »v. 
Call the first quantity S,, the second S.. Then S = Art + 2Wsroriwe + Av?7 
and S) is of the form hats _ 2A1AoM12 ae pe where 


2 2 
Mu = Tu Guu + 2TuT12uv + Ti2 Foy 
9 
Mig = 7117120uu + (712 + 711722) Ouv + 7127220 vy 
2 2 
M22 = 712 Suu + 27127228 uv + 722 Gor - 


Maximizing S/S» leads to the equations: 


S 

A Til + Ae —_— = z (Ai wu + Ae b12) 
Si 

Ai Ti2 + Ao T22 = S. (Aj Mi2 + do 22) 
2 


1.e. 
A(t — Oui2) + Ao(te — Ou) = O 
Ai(ti2 — Oui2) + As(t2 — O22) = 0, with 6 = S,/S2. 
It is thus seen that @ must satisfy the quadratic equation 
(71 — Oun)(722 — Ou») — (712 — Our)” = 0, 


in order for solutions \i, Az to exist. In general there will be two solutions, of 
which the greater corresponds to that linear combination \,X1 + 2X2 which has 
greatest multiple correlation with wu and v, whereas the smaller corresponds to 
that linear combination which has least multiple correlation with wu and »v. 
6 itself corresponds to R’/(1 — R*) for the regression of \:X1 + 2X2 on w, v. 

In the general case with s degrees of freedom corresponding to w; , W2, --* , Ws, 
there is an s-dimensional discriminant function (X; , X2,--- , X;), and a set of 
s linear combinations for which R’/(1 — R’) is stationary with respect to 


ee 


The s roots (corresponding to an equation of degree s) arranged in decreasing 
order, permit construction of the best one-dimensional, two-dimensional, --- , 
(s — 1)-dimensional discriminant functions. 





DISCRIMINANT FUNCTIONS 527 


Discussion of the relevant significance tests for these reduced discriminant 
functions is beyond the scope of this paper. Reference may be made to the 
work of Hotelling and Fisher. 
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NON-PARAMETRIC ESTIMATION II. STATISTICALLY EQUIVALENT 
BLOCKS AND TOLERANCE REGIONS—-THE CONTINUOUS CASE 


By JoHn W. Tuxkry 
Princeton University 


1. Summary. Wald (2, 1943] extended the usefulness of tolerance limits to 
the simplest multi-dimensional cases. His principle is here used to provide 
many new Ways of using a sample of n to divide the range of the population into 
n + 1 blocks of known behavior. The exact tolerance distribution for the 
proportions of the population covered by these blocks is extended from the case 
of a continuous probability density function to the case of a continuous cumula- 
tive distribution function. Such an extension is needed in dealing completely 
with multivariate cases cven where the underlying distribution is as smooth as a 
multivariate normal distribution. 

The devices used in Paper I [1] to extend the usefulness of tolerance limits to 
the case of a discontinuous underlying distribution will be applied in the next 
paper of this series, with some extension, to extend the usefulness of these gen- 
eral tolerance regions to the case of a discontinuous distribution. Some of these 
results specialize into new results for the univariate case, although they do not 
seem to have any immediate practical application. 

The author wishes to acknowledge the stimulation given to his work on this 
problem by Henry Scheffé, whose modesty has kept this paper from the joint 
authorship of papers I [1, Scheffé and Tukey 1945] and IV (not yet written). 


2. Introduction. Wald’s great contribution to the theory of tolerance limits 
was his method of successive elimination. As originally presented for a bi- 
variate situation it ran roughly as follows: Let (a1, yi), (v2, ye), +--+ 5 (ny Yn) 
be a sample of n from an arbitrary bivariate population. The type of tolerance 
region to be used is determined by four preassigned integers, k; , kz. , ks, and 
k,. The procedure is as follows: Order the n observations according to their x 
values. Select the k; highest, and let the x coordinate of the lowest of these k; 
be z,. Select the k. lowest, and let the x coordinate of the highest of these 
ko be 2,. Discard these k; + ke selected observations, and order the remaining 
n — ky — ke observations according to their y values. Select the ks highest of 
these remaining observations, and let the y coordinate of the lowest of these ks 
be yz. Select the k; lowest of these remaining observations, and let the y 
coordinate of the highest of these ky be y,. The tolerance region, consisting 
of all points (x, y), with a; < x < x, and y; < y < yz depends on the sample, 
and, hence, so does the fraction of the population falling in (= covered by) this 
region. Wald showed that the distribution of this fraction covered was in- 
dependent of the underlying bivariate distribution, so long as this latter dis- 
tribution had a continuous probability density function. He showed that the 
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distribution was the same as that arising in the one-dimensional case when a 
tolerance region was set with the aid of k; + i. + k3 + k, observations. (Nu- 
merical approximation to these distributions will be discussed in Paper IV of this 
series. 

The important device in this process, and the one which makes the conclusion 
possible, is the discarding of the k; + k: observations after they have played 
their part by determining x, and ~, . ' 

We shall shortly be able to describe this procedure of Wald’s as a special case 
of a more general procedure, but we shall first go back to the simplest one dimen- 
sional case to explain some of our notions and terminology. 

Consider the uniform distribution from 0 to 1, draw a sample of n, and let the 
sample values, ordered according to size be th, f2,---,t,. These n values di- 
vide the interval from 0 to 1 into the following n + 1 parts (0, t), (4, &), «++, 
(tn—itn), (t. , 1) which we shall eall blocks. Since the joint distribution of the 
t; is well known, that of the lengths of these n — 1 blocks is easily found. This 
distribution of lengths would be unimportant, if it were not at the same time the 
distribution of the fractions of the population covered by the blocks. As is 
shown later, this distribution of fractions covered, or, more simply, of coverages, 
has the following properties: 

(i) the fractions covered add up to 1. 

(ii) the distribution is completely symmetrical. 

Property (ii) makes intuitive the result of Wilks [3, 1941] that the distributions 
of the coverage of regions obtained 

(a) by removing the k; + ke left-most blocks, 

(b) by removing the /; left-most and the ke right-most blocks 
are identical. The specific distribution obtained satisfies 

(iii) if the coverages are taken as barycentric coordinates on an n-simplex, 

the distribution over the simplex is uniform, 

(iv) the sum of the coverages of any k preselected blocks of the n + 1 has 

the well-known distribution 


. 


Pr {sum of k coverages < t} = I; (n — k + 1, k) 


where J, (n, m) is the incomplete Beta function. 
We shall éall a set of blocks, derived from a sample, whose coverages behave in 
this general way a set of statistically equivalent blocks. Normally this will be 
abbreviated to se-blocks. (A precise definition is given in section 4.) 

We shall concentrate much of our attention on all the blocks and their sym- 
metrical character, rather than on the tolerance region formed by deleting k 
of them, since our results will then be applicable to many other problems. 

Now we can generalize Wald’s original procedure. Let Wi, W:,---, Wa 
be a sample of n—we shall not need to consider its distribution—and let ¢1 , 
-g2,°**, Gn be m numerically valued functions of W, possibly alike, possibly 
distinct, such that ¢:(W), @(W), --- ,¢,(W) have a joint distribution. Proceed 
as follows: 





\e 
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Order the W; according to the numbers ¢;(W,), select the W; for which ¢,(W,) 
is largest and denote it by Wia). The first block contains all W such that 
(2.1a) e(W) > ¢i(Wia)). 


Discarding Wa), order the remaining W; according to the values of ¢(W,), 
and select as Wi) the one giving the largest value. The second block contains 
all W such that 


e(W) < (Wig), 
e(W) > gol Wig). 
Continue this process. The mth block, for m < n will be defined by 


(2.1b) 


e(W) < o(Way), j=1,2,---,m-—1, 
On(W) > om(W im), 
and the (n + 1)st block by 
(2.1n) e(W) < o(Wiy), j=1,2,---,n. 


(A graphical example of this construction is given shortly.) This set of n + 1 
blocks will be statistically equivalent whenever the cumulative distribution of 
each ¢; function is continuous. 

To specialize this to the case described above, let W be a pair (x, y) of numbers 
and let 

(i) the first kig’s be the x-coordinate of W, 

(ii) the next key’s be minus the x-coordinate of W, 

(iii) the next k3¢’s be the y-coordinate of W, 

(iv) the next kyg’s be minus the y-coordinate of W, 

(v) the remaining ¢’s be arbitrary. 
Then the first k; blocks will contain all W for which 


(2.1m) 


x = o(W) > o(Wi), j 


Il 


Lh +--+, & 
that is, for which 
x> ty = gx,(Wia,). 
Similarly, the next k2 + ks + ks blocks will contain all W with 
t< &, 
yY> Yu, az StS, 


y< yi, S&S Bes 


respectively, and the removal of these ki + ke + ks + ks blocks leaves Wald’s 
tolerance region (plus the boundaries where x = 21,2 = 21,Y = Yu,Y = YD. 
There would be no point in this more general wording, if it did not include 
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new cases of some interest. We give now, in graphic terms, an example of such 
a case. 

We deal with a sample of n bivariate observations, which we think of as plotted 
on a map so that we can use geographical language. The number 7 is rather 
large, and we wish to construct a tolerance region by deleting 12 blocks. We 
proceed as follows: 

Find the most northerly point, draw an East-West line through it, and shade 
the area North of the line. Find the most easterly point in the unshaded area, 
*r wa North-South line through it, and shade the unshaded area East of the 


yy, 


Yj YY y yyy Uy YY 


SN 
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ine. Find the most southerly point, (always working in the unshaded area), 
draw an East-West line through it and shade the area South of the line. Find 
the most westerly point, draw a North-South line through it,-and shade the area 
West of theline. Find the most northeasterly point, draw a NW-SE line through 
it and shade the area northeast of the line. Find the most southeasterly point, 
draw a NE-SW line through it, and shade the area southeast of the line. Repeat 
this 6 times more, choosing in succession the most southwesterly, northwesterly, 
northerly, easterly, southerly, and westerly points. The remaining points will 
now lie in an unshaded area surrounded by a polygon, which will have 8 (or 
perhaps fewer) sides. The inside of this polygon is the desired tolerance region. 
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Figure 1 shows the final result, starting from n = 25. The practicing statis- 
tician is invited to try an example of his own with n at least 100. 

Other newly accessible cases can easily be invented by the reader, after he con- 
siders this example carefully. 

The use of a single W and n functions ¢; has two virtues; it simplifies nota- 
tion and frees the intuition, as compared with the use of n chance quantities 
Zi = gi(W). 

If the bivariate situation above were regarded as a 12-variate situation, where 
the variates were, in order, (y,z, —y, —7%,x +y,%—y, -x-—y,-2z+y, 
y, X, — y, — x) then the original Wald procedure with k, = ks = --- = ke = 
1; ko = ky = +++ = ko = O would apply to construct the same region. Yet 
even if x and y had a bivariate normal distribution, Wald’s proof would not 
apply without extension. For the 12-dimensional distribution js highly singular 
(it is concentrated on a 2-dimensional plane in 12-dimensional space) and there 
is no hope of a density function. An extension of Wald’s result to the case 
where the 12-dimensional joint cumulative distribution function is continuous 
—as is the case in this example when z and y have a continuous joint cumulative 
—is clearly needed. 

When we come to deal with the case of where the cumulative needs not be 
continuous we shall meet a further difficulty, namely “ties”. But if, as in the 
present case, the cumulative is continuous, it is easy to see that the probability 
that ¢:(W,;) = ¢.(W;) for any 7, j, k is zero. 


3. Terminology and notation. A quantity which has a probability distribu- 
tion we call a chance quantity (it has frequently been called a random variable). 
The term chance quantity does not imply that its values are single real numbers, 
they may be single real numbers (when we also speak of a real chance quantity), 
sets of n real numbers, or more general objects. The cumulative distribution 
function, or cumulative, of a single real chance quantity, X, is defined by 


F(t) = Pr{X < t}, 


except perhaps at the discontinuities of F. We have used here the notation 
Pr{k(X)} to indicate the probability that k(X) holds, and we have followed our 
policy of using capital letters for chance quantities and the corresponding 
lower case letters for their values. 

The set of values of W, or, as we shall say, the W-set, for which, for example 
g(W) < 3, will be denoted by 


{W|¢(W) < 3}. 


We shall wish to compute probabilities associated with one or more functions 
of a chance quantity; usually we will emphasize that these functions shall be 
measurable with respect to the probability measure underlying the distribution 
of W by asserting that they have a joint cumulative, which is defined by 


F(t; ; be, Pes ti) = Pr{g(W) < te}, 
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(except possibly at discontinuities of F) and which does not exist unless the g; 
are measurable with respect to the unknown underlying distribution of W. In 
cases where we neglect to remind the reader, it.is still assumed that the functions 
are measurable. 

The coverage of a W-set, which may itself be a chance quantity, is defined by 


Coverage of S = Pr {We 8S}. 


When S is a chance quantity, its coverage is also a chance quantity. The 
barycentric simplex (of dimension n) is the set of points in n + 1-dimensional 
Euclidean space (4, &,-++ ,én41) With +e + --- +41 = landO <é; <1. 
The name comes from the representation of the point (4, &, --- , tn41) as the 
center of gravity (in mechanical terms) or mean (in statistical terms) of the dis- 
tribution where a fraction ¢; is concentrated at the ith vertex. (In order, the 
vertices are (1, 0, 0, --- , 0), (0, 1, 0, --- , 0), ete.) The uniform distribution 
on this simplex has an (n-dimensional) density 


n'dtydt, --- dtn , Ss 4, 6.-°*+,&, 1 ai ty os lg +>: = t, < 1), 


and the cumulative 


T(x, , 22, = > Ln41) = nt | | as [ atsat se dt, 


where the integration is over the range where 0 < ¢; < 2; and at the same time 


htbpte:- thiSsl. 


4. The blocks determined by n values of W. We deal now with a population 
of W’s (a probability measure u on the space 7’ = {w}), a family of functions 
$1, $2, °°* » Gm Of W with a joint cumulative (measurable with respect to yu) 
and a set of values w,, We,--:, Wn, (wie T). 

(4.1) Derintrion The set w,, w2,--+ , Wn and the functions ¢1 , ¢2,°** 5 Om 
define blocks as follows: 


(4.2) Si = {w|¢i(w) > a} 


where a; = max ¢;(w,;) = ¢1i(wiay), which defines 7(1). 
i 


(4.3) S. = {wl ¢i(w) < a1, g2(w) > ay}, 


where dg = max ¢2(w;) = ¢e(Wi)), 7(2) ¥ 7(1), which defines 1(2). And in gen- 
*4i(1) 
eral, for 1 < k < min (m,n), 


(4.4) S. = {wlei(w) < a,--+, gealw) < api, o(w) > ag}, 


~~ i. 2. 
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where ax = max’ ¢(wi) = g:(Wiay), the maximum being taken over all i except 
7 


a(1), 7(2), --- , (1k — 1); and 2(k) being chosen distinct from all i(j), 7 < k. 
If m = n, then 


(4.5) Sati — {w | gi(w) < Mig"? * 5 $n(w) < an}. 
If m <n, then 


(4.6) Sminti = {w | g1(w) < Gr, °** 5 Pn(w) < Gm}. 


The result of this definition is to use w,,--- , Wn and ¢, -++ , gm to define 
n + 1 blocks (one more than there are w’s) in case there are enough functions, 
and, in case there are not enough functions, to define one small block, S; , for 
each function plus one large remainder Smjn4i. We notice 

(4.2) Remark. The blocks of (4.1) are well defined unless ¢\(w;) = ¢,(wx) for 
some 1, j, k. 


5. Statement of results for the statistician. The central results can be stated 
as follows: 

(5.1) THEOREM A ming. If Wi, We, +--+, Wa are a sample of n from a dis- 
tribution, tf ¢1 , g2°, *-* , Gm, (m <n), are m functions such that 


e(W), e2(W), ++ , em(W) 


have a joint distribution which has a continuous cumulative, and if the blocks 
Si, Se, +++, Smand Sminzi are defined as in (4.1), then 
(i) the blocks are disjoint chance sets, uniquely defined with probability one, 
(ii) the distribution of the coverages 


c; = Pr{w in S;}, a¢=1,2,---,m 
and 


Cmingt = Pr {w in Smjngi} 


is the same as that of t, te, +++ , tm and tmyi + tma2 + +++ + tng where t; 

are uniformly distributed on the barycentric simplex with n + 1 vertices. 
Conditions (5.1i) and (5.lii) are the precise definition of a partial family of 
statistically equivalent blocks of type n + 1 and an associated (m | n + 1) toleronce 
region. 

(5.2) THroreM Bayi. If Wi, We,---, Wn area sample of n from a distribu- 
tion, and if gi, ¢2,°** 5m, (m > n), are m functions such that 


o(W), e2(W), +++ 5 em(W) 


have a joint distribution which has a continuous cumulative, and if the blocks 
S,, Se,-:+, Snyi are defined as in (4.1), then 
(i) the blocks are disjoint chance sets, defined with probability one. 
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(ii) the distribution of the coverages 
c; = Pr {win S;}, 4=1,2,---,n+1 


is the same as that of tr , te, +++ , tn41, where the t; are untformly distributed 
on the barycentric simplex with n + 1 vertices. 
Conditions (5.21) and (5.2ii) are the precise definition of a complete family of 
statistically equivalent blocks. In Paper III we shall have to widen these notions 
a little, and this form will then be qualified by the phrase “in the narrow sense”, 


6. Statement of results for the measure theorist. The construction of (4.1) 
maps the product 7” X U" into E”™ where T' is the set of w’s (and hence 7” is 
the set of ordered n-tuples of w’s), U is the space of all real-valued functions 
defined over 7’, measurable with respect to a fixed probability measure yp, and 
possessing a continuous cumulative, (i.e. u({w | ¢(w) = c}) = 0 for all real c), 
and hence U” is the space of ordered n-tuples of such functions, and E"™ is 
Eud@idean n-dimensional space. More precisely, the mapping is into the bary- 
centric simplex with n + 1 vertices, a subset of E"*" and is well defined except 
for a set in 7” of measure zero with respect to «%”, the power measure of u. In 
these terms, we may restate theorem B as follows: 

(6.1) THrorEeM B,4,. Hold the n functions ¢1 , 2, +++ , ¢n and the probability 
measure fixed, then T” is mapped into B, and the power measure yw” is carried by 
that mapping into a measure on B,. This measure is always n! times Lebesgue 
measure. 


7. Wald’s principle. The essential principle behind Wald’s process of dis- 
carding observations is sufficiently fundamental to warrant a name of its own. 
It can be stated, quite generally, in the two following forms: 

(7.1) Watp’s PrincipPLe. (discrete form.) Let W be a chance quantity, 
and consider samples of n. Fix disjoint w-sets Ay, Az,++:,Am,B. Consider 
those samples of n for which exactly one value falls in each A; and the remaining 
n-m fallin B. The distribution of the n-m falling in B is that of a random sample 
of n-m from the distribution of W restricted to B. (i.e. us(X) = [u(B)]"u(BX).) 

(7.2) WaA.LpD’s PRINCIPLE. (conditional form.) Let W be a chance quantity, 
and » a function such that each value of g(W) has probability zero. Consider 
samples of n. Then the conditional distribution of the w; , given that 


max ¢(w;) = a, 
+ 


as that of one Win with p(wio) = a and a sample of n—1 other w; from the distribu- 
tion of W restricted to B = {w|¢(w) < a}. 

(7.3) CenTraAL Lemma. Let W be a chance quantity and let g1,---, on be 
functions with a joint cumulative such that 9,(w) = a has probability zero for each 
i and a (i.e. the joint cumulative is continuous). Then the conditional distribu- 


1 


v 
a 
l 
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tion of the remaining n—k w’s, after k blocks have been chosen according to (4.1) 
is that of a sample from the distribution of W restricted to 


B= {wlgilw) <a,-++, ¢(w) < ar}, 


where k = 1, 2,---,n. 

The proofs of these statements are elementary and direct. To establish (7.1) 
we have only to show that given two sets in B”“, their probabilities on the 
assumption that one w; is in each A, are in the ratio of their probabilities for an 
unrestricted sample of n—k. But the probability of finding the n—k w; in a 
set R, contained in B", and one w; in each A,j, is exactly 


n! 
a u(Ay)y(Ae) e+ p(Ag) 
times the probability that n — k w; , known to be in B”, will fall in R. This 
establishes (7.1). 

In order to prove (7.2) we must show that the probability of a set R of n- 
tuples wi , W2, --* , W, is the same whether calculated directly or calculated by 
the proposed conditional distribution. To this end, it is natural to decompose 
R as follows: 


R = R(1) + RQ) +--- + Rm) + Z, 


where (i) contains those (w:,--- , wn) in R for which ¢g(w,;) > ¢(w;) for all 
j ~ 1, and Z contains the remaining (w; , --- , w,); which must involve at least 
one tie g(w;) = ¢(wx.), 7 # k. Since Z has probability zero, it will suffice to 
establish the equality of the two calculations for sets of the form R(z), and be- 
cause of symmetry we may restrict ourselves to sets of the form R(1). 

Given an integer V, we decompose the range of ¢(w) into Nn segments of equal 
probability, which we may do because the cumulative of ¢g is continuous. There 
are then Nn values b; , (bb = — ©, byn = + ©) such that — 


Pr {bra < g(w) <b} = 1/Nn. 
We now decompose our set R (which is of the form R(1) as follows: 
R= kh,+---+ Ryn t+ Y, 
where R; contains those n-tuples 
(wi, +++ , Wn) for which bk < g(w:) < & 


and ¢(w,;) < by: for allz > 1. The remaining set Y contains n-tuples where 
the two largest ¢(w;), (¢ = 1 and i = 7,), belong to the same interval. The 
probability of this is less than 


n(n — D(4) Say 
2 mN/ ~ 2N? 
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as calculated from the known distribution. Calculating from the conditional 
distribution, we find immediately a bound of 


oS -G2) 42 
= Way way (@ + =, 5) 


Soa TWay (> »(nN)") 
_A, 
= 4, 


where A, is a constant depending only on n. Thus, as N increases, the prob- 

ability of the successive sets Y tend to zero—calculated either way. To show 

the equivalence of the two calculations it is now sufficient to show that they 

agree for the sets R,. But this is a case of (7.1) and the lemma is proved. 
Now (7.3) follows by induction, applying (7.2) at each step. 


8. Proof of theorems. We notice that Theorem B,, is equivalent to Theorem 
A mins , Since, according to (4.1) Syingi = Sry. 

We have only to prove theorem Amjn4:, which we do by induction on m. 
For m = 1, it is exactly Wilks’ [8, 1941] original one-dimensional theorem, and 
is known. Let us assume it for m = k and demonstrate it form = k + 1, for 
by induction this will complete the proof. 

We must deal with the blocks S,, S:,--- , Si, Sei and Sk4ijng1 , (notation 
as in (4.1) and (5.1)). We need the obvious 

(8.1) Lemma. Since the cumulative of gx41 is continuous, the union of Sx41 
and Si.41)m4: differs from Sj m41 by a set of zero probability. 


Hence 
Crjntt = Ceti HF Cripnti - 
Since we know from the induction hypothesis that ¢1, c, +--+, ce and Cxjas1 
have the correct joint distribution, we have only to show that c.4, and q , 
Co, °** , & have the correct joint distribution. Fix cq , @,-+-:,c,. Then 
@,, @2,°°* , a must be fixed, and so (7.3) applies to the n—k w,’s not dis- 
carded after a, d2,-:- , a have been fixed. The conditional distribution of 
Cx41 must be that of a fixed number (1 — ce; — cz — «++ — cx), which is the 


probability attached to S;.),41 , times the coverage of one block based on a sample 
of n—k, since the remaining n—k w’s behave like a sample. 

Consider the very particular case where w is uniformly distributed between 
zero and one and ¢;(w) = w, all that we have said in the last paragraph applies 
—the conditional distribution of ¢,1 given ¢, , ¢; , +++ , ¢; is the same in the two 
cases—hence the joint distribution of c; , ¢2, +++ , Ce, Ce4: is the same in both 
cases—but in this very particular case the joint distribution is known to be 
that required by theorem A;-4:1;n4: . 
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SOME BASIC THEOREMS FOR DEVELOPING TESTS OF FIT FOR 
THE CASE OF THE NON-PARAMETRIC PROBABILITY 
DISTRIBUTION FUNCTION, I 


By Braprorp F. KmmsBauu 
State Department of Public Service, New York, N.Y. 


1. Summary. In developing tests of fit based upon a sample O,(2;) in the 
case that the cumulative distribution function F(X) of the universe of X’s is 
not necessarily a function of a finite number of specific parameters—sometimes 
known as the non-parametric case—it has been pointed out by several writers 
that the “probability integral transformation” is a useful device (cf. [1]-[4]). 

The author finds that a modification of this approach is more effective. This 
modification is to use a transformation of ordered sample values x; from a random 
sample O,(«;) based on successive differences of the edf values F(x;). 

A theorem is proved giving a simple formula for the expected values of the 
products of powers of these differences, where all differences from 1 ton + 1 are 
involved in a symmetrical manner. 

The moment generating function of the test function defined as the sum of m 
squares of these successive differences is developed and the application of such 
a test function is briefly discussed. 


2. Introduction. Let the sample values x; be ordered so that 
(2.1) ti S Vin, (¢@ =1,2,---,n—1). 
Let F, denote the value of the cdf F(X) associated with the rth ordered sample 
value z,. Thus 
(2.2) F, = F(z,). 


Consider the following transformation of the ordered sample values x; based 
upon the (hypothetically) known cumulative distribution function F(X) which 
will be taken as a continuous function of X over its admissible range: 

UW = F, ; 
(2.3) Ur = F, — Fy, (r = 2,3, --+,n) 
Uni = 1—- P,. ° 

The restrictions on F; are that 

(2.4) F; S Fiz,, and0 S F; € 1. 


The above transformation (2.3) translates these conditions into the symmetrical 
conditions 
(2.5) 0 S u;, and wm + UW fees + Un + Ung = 1. 
A one-to-one correspondence between wu; and F; exists if one of the u; be omit- 
ted,—say ug. With ug omitted, the Jacobian of the transformation from F; to u; 
540 


~ ee a ee ee 
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has value unity. The probability density of the sample O,(z,), with z; ordered, 
is given by 


(2.6) P[O,(x;)] dO, = n! dF, dF, --- dF,. 
Hence with ug omitted, 
(2.7) P(O,(x,)] dO, = n! du duz «++ dug dugyr +++ dung. * 


The sample space of the u; with ug omitted, is that portion of the n + 1 
Euclidean space of all the u; variables, bounded by the coordinate hyperplanes, 
which is on the projection of the hyperplane (2.5) upon the hyperplane ug = 0. 
This is a region in the n-space of the u; with ug omitted, bounded by the coor- 
dinate hyperplanes and the hyperplane 


(2.8) Ur Ug toes H Up H+ Upa $ oes + Un + Ua = 1. 


Thus the formal integral of the pdf of the u; over sample space is 


(2.9) n! | n | duy +++ dug digs: +++ dng = 1 


with 0 S wu; , and u; bounded above by the hyperplane (2.8). 

It is now clear that both the pdf and the sample space of the u; (with ug 
omitted) are symmetrical in the u;. This fact leads to complete symmetry of 
the joint distribution function of any set of u; , over? = 1 ton + 1 including ug, 
relative to the u; selected. Other interesting results are forthcoming. 


3. Basic mathematical theorem. Using the techniques associated with the 
Beta function, the expectation of the products of powers u; is found to be 


Elu? -ui “Ut om ] 


=Tnt+)l~pt+)rq@t+)rwt+ ))---/PTa+ptqtwt---+1) 


where 7, s, t, etc., are any set of different indices (for the present other than £) 

from the integers 1 ton + 1, and p, q, w, etc., are any real numbers greater than 

minus one. The relation (3.1) can further be generalized to the case where ug 

may be included. This will be proved for the case n = 2, with p, q and w 

taken as integers. The generalization can be concluded from inspection. Thus 
) with 


(3.1) 


uy = 1 — Ww — U2, 


1 1—ug 
Elu?-us-uz] = a1 [ ug du. [ ui (1 — uw — us)” duy 
0 0 


1 1 
il _ 21 | us (1 “ve nye f s?(1 ae s)” ds 
0 0 
2!p!w! [ ‘ onune 2! plq!w! 
Ut aeons 1 — ue du = ————— — _.. 
i @Fwtiih WO”) ‘ (p+at+w Ft 2)! 


” Hence the theorem: 
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THEOREM. Given a random sample of n values of X from a universe with cdf 
F(X) which is continuous over the range of X. With the sample values x; ordered 
so that x; S 2X41 define a set of n + 1 variables u; as the successive differences of 
F(x;) by the relations (2.3). The expected value of the product of real powers greater 
than minus one of any or ail of the u;, (¢ = 1, 2, --- ,m + 1), zs given by the rela- 
tion (3.1) above (not subject to the omission of ug). 

There are many interesting consequences of this theorem. Perhaps the most 
striking is the following: 

Coro.Luary 1. Let a range a(m, k) for positive integer m be defined by 


(3.2) a(m, k) = F(xk4m) — F(x) 
with k= 0,1,2,---,n,andmsn+1-k 
under the convention 
F (2) = 0, F(2n41) =]. 

The probability distribution of a(m, k) is independent of k and hence is the same as 
that of F (xm). 

Another interesting consequence (not new) is the following: 

Coro.uary 2. The correlation of u; and u,,7i # k, ts the same for all pairs 


(t, k) over the range of indices from 1 ton + 1, and has the value —1/n. 
Introducing the notation 


(3.3) n+t+r]h=(n+tr(n+r—1)--- (n+), 
the corollary follows from the relationships 
E(u) = 1/(n+ 1), E(u) = 2/[n+ 2k, E(uan) = 1/[n + 2h. 


The fact that the correlation between any two frequency differences u; and u 
is negative leads to the following more general relationship: 
Corotuary 3. For any set of different indices 1, j, k, etc., and for any positive 


numbers p, q, 1, elc., the expectation of the product of the powers p, q, 7, --- of 
Ui, Uz, Up +++ 78 less than the product of the expectations of the powers taken 
separately: 

(3.4) Elu?-ui-uj +++) < E(u?)-E(uj)- (uz) -- 


This follows from generalization of the relation 


rin + DP@ + UP@+ Yre + 1) 


Tint+pt+qt+rt+1) 
_@@+DPr@ + OT@+ re +1) 
Nntpt+irnt+q+ Dr r+r4+1)’ 
The above theorem suggests the possibility of test functions for fitted distribu- 


tions, relative to a universe with a cdf which, since it is merely conditioned by a 
sufficient hypothesis for the theorem, may be of the non-parametric type. 
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A test function of the form 


(3.5) Y=) wi, p real and positive 


might first come to mind. If p = 1, compensatory effects of deviations reduce 
the efficiency of the test function. One is thus led first to consider the test 
function (3.5) for the case p = 2. 


4. The moments of the probability distribution of yn = 2u;. We are 
first concerned with the problem of the determination of the moments of the 
function 


(4.1) Yn = >, Uj 


where 2 ranges over any particular fixed set of m integers which for simplicity 
is usually taken as the first m. 

One first recalls the fact that the result is independent of which m indices have 
been selected; and that the expected value of any combination of powers is 
independent of which specific subscripts of u; are involved. 

Since the wu; are correlated, principles of combinatory analysis are involved in 
determining the moments of y,,. One possible way of obtaining the moments 
is as follows: 


Let v, denote the rth moment of ym about ym = 0. Thus 
(4.2) E{(ym)"] = v = El(Q) ui)’. 


Now in the expansion of (> uj)’, the sum of the power indices of each term 
™ 


is2r. Thus referring back to (3.1) and (3.3) it will be noted that the expected 
value of each such term will have the common factor 


1/[n + 2r}e, . 
Consider a general term of the expansion of (> u?)’ 


2 9 2 : 
Cung* Me ee *** with mtmtee- +nm=r. 
. 
Clearly 
E(uztugy? +++ ugg) = Qn! ret +++ Iz !/[n + Qrhor . 


and the coefficient C,,,,...,, is the multinomial coefficient 


y Ne 
( iy ee 


Now in the expansion of (}°17)" group the terms which have the same set of 


m 
k values of 7; , irrespective of which indices of u; are involved. The number of 
such terms (since each involves / different indices) is (7). If, 72,-+++, %%, 
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are all different each combination could be taken in k! different ways. Thus with 
r’s all different and fixed, the sum of all coefficients of terms with same combina- 
tion of 2r; powers (irrespective of variation of indices of the u;) is 


r! 
m\ i. ———_.. 
k Ty!Tole+ + 1,! 


This would then constitute the total multiplier for 
2r; ! 2re Looe Qn, !/[n + 2r|er 


for a given set of k r’s which are all different. 

If some of r’s are repeated, let k; , ke , --- , k, denote the number of repetitions 
of each different r; (k; 2 1, and ki + kz + ---+ k, = k). Then each com- 
bination of the k r’s corresponding to a set of k products could be taken in 


k1/(ki! ko! ++* ke!) 


different ways. Hence the lemma: 
Lemma 1. Consider all admissible sets of k different subscripts of u; and a fixed 
set of values of r = 71,172, °** , Tx where 


Nntrtee--+n=r 


such that s of these r’s are different, and the number of repetitions in the set of r’s 1s 
given by ky ke -+- ks (ki 2 1, and ky + ko +--+ + ke = k). The composite 
coefficient of the terms in v, involving the factor 


2r;! Qro! «++ Qry!/[n + ror 


as given by 


(4.3) 5) ky hol +++ Kel rytrel s+ re! 


Examples of computation of v, by means of the above lemma. The first order 
moment is given by 


(4.4) v, = E(diuj) = m 2!/[n + 2h. 


The second order moment is given by 
v2 = E[(doui)’] = CiE(uj) + C.E(uju)), 


and determining the values of C; from Lemma 1, 


v = | ma + (7) (7) an 2121 | / In + 4], 
or 
(4.5) v. = | ma +8 (3) |/™ +4), = | m + (3) 31/( z " ; 





er 
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Again for the third order moment, 
vs = El(Doui)*] = CE (ut) + CE (uiu§) + CsE(uiujud), 


and using Lemma 1, 


. m\ 2! 3! m\ 3! 3! 


- | mo: + () 21314! + (7) zi212131| / tm + 6le 
or 
wo m[m+(2)5+G)il/C 5) 


Similarly writing the fourth moment in the form 
v4 = CiE(uj) + CrE(ujui) + CsE(ujuh) + CE (ujuiui) + CsE(uiujuiui) 


and using Lemma 1 it reduces to 


7 m\ 2 m\ 3 m\ 3 m\ 1 n+8 
an u=[m4(2)7+(2)35+ (3) a5 + (2) is] /(" 5°): 
Higher order moments of the probability distribution function may be com- 
puted as desired. 
An alternate method of computing the moments of the distribution of this test 


function is the following: 
Consider a function g(x) such that 





(4.8) Fo) = (art, go(0) = 1. 

Thus 

(4.9) E{u""} = [d"go(0)/da"}/[n + 2rer . 

From the principles of combinatory analysis of linear operators, it follows that? 
(4.10) BE wi)] = SOON) im + rl. 


Although this is an enlightening analytical form, actual computations seem to be 
simpler with the use of Lemma 1. 


1 One way of seeing this is to first think of the u; as statistically independent. The 
numerators of the resulting terms would be the same as in (4.10). When the u,; are taken 
as dependent, by virtue of (3.1) the numerators will remain the same while all denominators 
will reduce to [n + 2rler . 
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Moment generating function. The moment generating function of the prob- 
ability distribution of y, can be written as 


(4.11) E(e”) = Go(t, m) = 1 + Dd [d’go(a)]"/da" | anol /[n + ro, t”/r! 
r=1 
with 
g(r) =~1+2!x4 4!2°/2! + 6!2°/8!4 +--+ 4 (2r) Ia /r! + --- 
[n + 2r]o, = (n + 2r)(n + 2r — 1) --- (n+ 1). 


Although go(x) exists only as a formal power series, Go(t, m) is defined by (4.11) 
as a power series with positive coefficients, converging for all ¢. 


5. Some comments on test function, p = 2. At the present time the study of 
the test function for p = 2 has not gone far enough to justify publication of re- 
sults. One difficulty is that although its asymptotic distribution function ap- 
pears to be normal, the convergence towards normalcy may be extremely slow 
in some cases. 

Furthermore there are indications that the case m = n + 1 will give the most 
definitive results not only because the complete range of data is used, but also 
because errors of Type II would in general have a less erratic effect. 

For the case m = n + 1 the mean, variance and third and fourth reduced 
moments (i.e. moments about the mean divided by corresponding power of a) 
are: 

Casem = n + 1. 


E(Ynq1) = 2/(n + 2), a 4n/|(n + 2)?(n + 3)(n + 4)], 
— _ W0n-4 | (n+ 3)(n + 4) 

(n + 5)(n + 6) 4 * 

- | n’ + 101n? + 14n — 8 1: + 3)(n + J 
a4 = (n + 5)(n + 6)(n + 7)(n mn 


_ 6(41n* + 241n* + 118n* — 784n — 48) 
n(n + 5)(n + 6)(n + 7)(n + 8) 





a; = 3/0" 


(5.1) 








ag — 3 


If data is not grouped the test may be applied as follows: Given a function 
Q(X) which has been fitted to the cdf F(X). From a random sample of size n 
with x; ordered as in (2.1) compute the successive differences of Q(x;) to obtain 

° oe m ° ° 
the variables u;. Then consider the sum of the squares 
. *2 
U* = Ug - 
n+l 
If Q(X) is a true representation of F(X) the variation of U* will follow that of 
Yn4i- Thus the expected value of U*, its variance etc. will be independent of 
the fitted function Q(X), which represents certain advantages over the x’ test. 





f 
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The effect of Type II errors can be roughly analyzed as follows: In considering 
the effect of such errors the testing procedure must be criticized from the point 
of view that 


Q(X) ¥ F(X). 
For m = n + 1 it still is true that 
Du =1 
which tends to act as a control upon U*. For example set 
* 
U=U+X- 
Then from the above relation it follows that 
Write U* as 
U* 


Du; + Uxi + 2Dux: 
= Dui + Exe + (22xi)/(m + 1) + 22Bx5(us) 
where 6(u;) denotes the variation of the true frequency differences from their 
expected value 1/(n + 1). 
The variation 6(u,) will be to a considerable degree independent of x;. Thus 
the term 2 xj will in general tend to be larger than the last term on the right. 
The third term on the right will be zero by virtue of (5.2), and hence U* will tend 


to be larger than y,4:. A similar effect upon the sampling variance of U* can 
be noted. Hence an interval of rejection 


= A, Plyn41 S A] = a = confidence level, 


(5.3) 


is pointed to. 

On the other hand if m < n + 1 the condition (5.2) no longer holds, the term 
(2 2x.)/(m + 1) of (5.3) will not be zero and in many cases would dominate the 
other two error terms. Thus it is easily conceivable that one may have in the 
casem <n-+ 1 


Un < Ym 


even when the d’screpancies x; are large. Hence in the case m < n + 1 choice 
of confidence interval will require considerable care (see [1]). 

Although the distribution of y,4; for small n is decidedly non-normal, if the 
test function is replaced by 


(5.4) Taga = (Z[us — 1/(n + 1))?)! 


it will be found that the probability density function takes on the normal charac- 
ter quite rapidly with increasing n. Indeed the author has found that a com- 
puted approximation to the probability density function of 7,4; with n = 4 is 
decidedly normal in character. 
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AN ESSENTIALLY COMPLETE CLASS OF ADMISSIBLE DECISION 
FUNCTIONS 


By ABRAHAM WALD 


Columbia University 


Summary. With any statistical decision procedure (function) there will be 
associated a risk function r(@) where r(@) denotes the risk due to possible wrong 
decisions when @ is the true parameter point. If an a priori probability distribu- 
tion of @ is given, a decision procedure which minimizes the expected value of 
r(@) is called the Bayes solution of the problem. The main result in this note 
may be stated as follows: Consider the class C of decision procedures consisting 
of all Bayes solutions corresponding to all possible a priori distributions of 0. 
Under some weak conditions, for any decision procedure 7 not in C there exists 
a decision procedure 7* in C such that r*(@) < r(@) identically in 6. Here r(@) 
is the risk function associated with 7’, and r*(6) is the risk function associated 


with T*. Applications of this result to the problem of testing a hypothesis are 
made. 


1. Introduction. In some previous publications [1], [2] the author has 
considered the following general problem of statistical inference: Let 
X = (X1,---, Xn) be a set of chance variables. Suppose that the only infor- 
mation we have concerning the joint distribution function F of these chance 
variables is that F is an element of a given class © of distribution functions. 
Suppose, furthermore, that a class D of possible decisions d is given one of which 
is to be made on the basis of an observation x = (x, --+ , 2,) on the chance 
vector X. The problem is then to construct a function d(z), called statistical 
decision function, which associates with each sample point x an element d(x) 
of D so that the decision d(x) is made when the sample point x is observed. A 
statistical decision function d(x) is defined over all possible points z of the sample 
space and for each sample point x the value of the function is an element of D. 
Each element d of D will usually be interpreted as a decision to accept the 
hypothesis that the unknown distribution F of X belongs to a certain subclass 
w of 2. Different elements d of D correspond to different subclasses w of 2. 

The problem of testing the hypothesis H that the unknown distribution func- 
tion F belongs to a given subclass w of Q, is contained as a special case in the 
above general problem. The space D will then contain only two elements, 
d, and d., where d,; denotes the decision of accepting H and d, denotes the 
decision of rejecting H. 

As in [1] and [2], we shall assume also here that Q is a k-parameter family of 
distribution functions. Then each element of 2 may be represented by a point 
6 = (6:,--- , 0), called parameter point, in the k-dimensional Cartesian space. 
The class Q is then represented by a subset of the k-dimensional Cartesian space, 
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called parameter space. We shall, therefore, refer to 2 as the parameter space 
and to its elements as parameter points. 

The merits of any particular decision function d(x) will usually depend on 
the relative importance of the various possible errors caused by not selecting 
the proper element d of D. The relative importance of such errors has been 
described in [1] and [2] by a weight function W(6, d) defined over the product of 
Qand D. For any pair (0, d) the value of W(6, d) is non-negative and expresses 
the loss caused by taking the decision d when @ is the true parameter point. 
For any given decision function d(x) the expected value of the loss is given by 


(1.1) r(6) = [ WG, d(x) aF(x) 


where M denotes the sample space and F(x) is the joint cumulative distribution of 
X = (X,--- , Xx) corresponding to the parameter poini 0. 

The function r(@) is defined over the parameter space © and is called the risk 
function. The shape of the risk function r(6) will, in general, be affected by the 
decision function d(x) used. To put this dependence in evidence, we shall use 
the symbol 7[6 | d(x)] to denote the risk function r(6) associated with the deci- 
sion function d(x). 

A decision function d(x) is said to be uniformly better than the decision 
function d*(zx) if 


(1.2) r[6 | d(x)] = r[6 | d*(z)] 


for all @ and if there exists at least one point @ for which the inequality sign holds 
in (1.2). A decision function d(x) is said to be admissible if no other uniformly 
better decision function exists. 

A class C of admissible decision functions will be said to be essentially complete 
if for any decision function d(x) not in C there exists a decision function d*(z) 
in C such that 


r[@ | d*(x)] = 7[6 | d(z)] 


for all @. 

In section 2 we shall formulate certain assumptions which will then be used 
in section 3 to derive an essentially complete class of admissible decision func- 
tions. In section 4 applications are made to the problem of testing a hypothesis. 

In a recent paper Lehmann [3] obtained an essentially complete class of 
admissible tests for each hypothesis H of a certain restricted class of simple 
hypotheses. The restrictions imposed on 2 in Lehmann’s paper are essentially 
those formulated by Neyman [4], [5] to insure the existence of the type Ai 
(uniformly most powerful unbiassed) test. Our definition of an essentially com- 
plete class of admissible decision functions agrees with that given by Lehmann 
when the problem is to test a hypothesis and the weight function W(6, d) can 
take only the values 0 and 1. 
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2. Assumptions. Throughout this paper we shall make the following as- 
sumptions: 

Assumption 1: The parameter space Q is a bounded and closed subset of a 
finite dimensional, say k-dimensional, Cartesian space. 

We shall introduce the following convergence definition in the space D: a 


sequence {d,,}, (m = 1, 2,--- , ad inf.), of elements of D is said to converge 
to the element d of D if 


lim W(0, dn) = W(8, d) 
uniformly in @. 

Assumption 2: The space D is compact and, for any d, W(6@, d) is a continuous 
function of 0. 

Assumption 8: For any point 6 of Q the joint distribution function of 
X = (X1,°-:-, Xn) admits a density function p(z, @) for all points xz of the 
n-dimensional Cartesian space M (sample space). The density function p(z, @) 
is assumed to be continuous in z and @ jointly. 

In what follows we shall mean by a distribution function f(6) of 6 a cumula- 


tive distribution function for which [ df(@) = 1 and for which | W(6, d)adf (6) 
Q Q 
is not zero identically in d. 
Assumption 4: For any point x of M, except perhaps for a set of measure 


zero, and for any cumulative distribution function f(@) there exists one and 
only one element of D for which the expression 


(2.1) [ We, ape, © af 


takes its minimum value with reSpect to d. 

Assumptions 1 and 3 in this paper are exactly the same as Assumptions 1 and 3 
in [2]. The formulation of Assumptions 2 and 4 is somewhat different from 
that given in [2]. This is mainly due to the fact that in [2] the space D has the 
same elements as 2, while here this is not necessarily so. It can be verified 
without difficulty that this slight modification of the assumptions does not 
affect in any way the validity of the results obtained in [2]. Thus, we shall be 
able to make use of any theorems proved in [2] for the purposes of the present 
paper. 


3. Derivation of an essentially complete class of admissible decision func- 
tions. For any distribution function f(@) defined over Q and for any sample 
point x let d(x, f) denote the element of D for which the expression (2.1) takes 
its minimum value. It follows easily from the definition of r(@) and d(z, f) that 


(3.1) [ tolacnus@ s J nol ema 
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for any decision function d*(x). If we interpret f(@) as an a priori probability 
distribution of 6, inequality (3.1) says that the expected value of r(@) takes its 
minimum value for the decision function d(x, f). We shall refer to d(x, f) as 
the Bayes’ solution of the problem corresponding to the a priori probability 
distribution f(@). 

We shall now prove the following theorem. 

THEOREM 3.1. The class C of all Bayes’ solutions d(x, f) corresponding to all 
possible a priori distributions f(@) is an essentially complete class of admissible 
decision functions. 

Proor. First we show that for any distribution f(@) the decision function 
d(x, f) is admissible. Let d(x) be a decision function such that 


r[@ | d(x)] S r[@ | d(a, f)] 
for all 6. Then 


(3.2) [ rle|d(x)l af) < [ r[o| d(x, f)] af(@). 


From the definition of d(x, f) it follows that the equality sign must hold in 
(3.2), ie., 


(3.3) [ ro | d(x)] df(e) = [ ro | d(x, f)] af(6). 


From the second half of Theorem 4.2 in [2] it then follows that 
r[d | d(x)] = r[6 | d(a, f)] 


for all 6. Hence d(x, f) is an admissible decision function. 

We shall now show that the class C of decision functions d(x, f) corresponding 
to all possible a priori distributions f(@) is essentially complete. Let do(x) be 
any decision function not in the class C. The essential completeness of the 
class C is proved if we can show that there exists a distribution f(@) such that 


(3.4) r[@ | d(x, f)] S r[@ | do(x)] 
for all 0. 
To prove (3.4) we shall consider the weight function 
(3.5) W*(6,d) = W(6, d) — r[@| do(x)] + Max 7[6 | do(x)] 
8 


The maximum of 7[6 | do(x)] exists, since according to Theorem 4.1 in [2] r[@ | do(x)] 
is a continuous function of @. Clearly, Assumptions 1-4 remain valid if we 
replace W(6, d) by W*(0,d). Let r*[6 | d(x)] denote the risk function associated 
with the decision function d(x) if the weight function is given by W*(6, d). 
According to Theorem 5.2 in [2] there exists a decision function d*(x) such that 


(3.6) Max r*{@ | d*(x)] < Max r*[6 | d(x)] 
8 6 
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for any decision function d(x). Since 


Max r*[@ | do(x)] = Max 7[@ | do(x)] 
8 8 


it follows from (3.6) that 


(3.7) Max r*[@ | d*(x)] S Max r[6 | do(x)]. 
8 

Inequalities (3.5) and (3.7) imply 

(3.8) r[@ | d*(x)] S r[6 | do(x)] 

for all 6. 


For any distribution f(@) we shall denote by d*(z, f) the Bayes solution of 
the problem corresponding to the a priori distribution f(@) when the weight 
function is given by W*(0, d). Since W*(@, d) — W(6, d) depends only on @ 
but not on d, one can easily verify that d*(z, f) = d(x, f). It follows from 
Theorems 4.4 and 5.1 in [2] that there exists a distribution f(@), the so-called 
least favorable distribution, such that (3.6) remains valid if we replace d*(x) 
by d*(a, f). Thus we can put 


(3.9) d*(x) = d*(x, f) = d(z, f). 
Hence, from (3.8) we obtain 
r[@ | d(x, f)] S r[6 | do(x)] 


for all 6. This completes the proof of Theorem 3.1. 


4. Applications to the problem of testing a hypothesis. In this section we 
shall apply the results of the preceding section to the problem of testing the 
hypothesis H that the true parameter point is included in a given subset w of Q. 
We shall assume that w is an open subset of 2. The space D consists now only 
of two elements, d; and d2 , where d; denotes the decision of accepting H and d, 
denotes the decision of rejecting H. 

We shall assume that the W(@, di) is equal to zero for points 6 in the interior 
or on the boundary of w, and positive elsewhere. Similarly, W(6, d2) will be 
assumed to be positive for points 6 inside w and zero outside w. For any a priori 
distribution f(@) the Bayes solution is given by the following test: We reject 
the hypothesis H if (and only if)’ 


(4.1) [ WOO, &i)p(e, 6) df(a) > [ W(6, ds)p(x, 0) df(6). 


Thus, the class C of regions (4.1), corresponding to all possible distributions 
f(0), is an essentially complete class of admissible critical regions. 
For any critical region R we shall denote the probability that the sample x 


1 Whether the equality sign is included or not in (4.1) is of no consequence, since by 
Assumption 4 the measure of the set of points z for which the equality holds in (4.1) is zero. 
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will fall in R when @ is true by P(6| FR). It follows from Lemma 4.4 in [2] and 
Assumption 3 that P(@|) is a continuous function of 6 for any region R. 
Since W(6, d,) is positive in the interior of 2 — w, and W(8, dz) is positive in w, 
the class C' of regions defined in (4.1) will have the following properties: 

(a) For any region R outside the class C there exists a region R* in C such that 

P(6| R*) S P(6| R) inw 
and 
P(6| R*) > P(6| R) nQ — wo. 
(b) If R and R* are members of C such that 


P(@| R*) < P(6| R) inw 
and 
P(6| R*) > P(@| R) nQ — ao, 
then 
P(6| R*) = P(6| R) for all 6. 


For any distribution g(@) consider the critical region consisting of all sample 
points x satisfying 


(4.2) [ e, 0) ago) > | p(x, 0) dg(o). 


Let C* be the class of regions (4.2) corresponding to all possible distributions g(6). 
One can easily verify that any region in C is also a member of C*. Thus, the 
following theorem holds: 

THEOREM 4.1 Suppose that Assumptions 1 and 8 are fulfilled and w ts an open 
subset of 2. Suppose, furthermore, that for any distribution g(@) the set of sample 
points x satisfying the equation 


[_ v(e, © ago) = f v(x, 0) agi) 
Q—w w 


has the measure zero. Then, for any region R outside the class C* there will be a 
region R* in C* such that 
P(@| R*) < P(@| R) inw 
and 
P(@| R*).= P(@| R) inQ — w. 
Addition at proof reading: After this paper was sent to the printer, the author 


obtained a generalization of Theorem 3.1 to sequential decision functions, as well as 
some other results. They will appear in a forthcoming issue of Econometrica. 
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DISCRIMINATING BETWEEN BINOMIAL DISTRIBUTIONS 


By Paut G. Hor. 
University of California at Los Angeles 


1. Summary. Given a set of & random samples, 7, %,-°-::, 2%, from a 
binomial distribution with parameters p and n, it is shown that the familiar 
binomial index of dispersion 


k 
= (x; — #)° 
a 


yields an approximate best critical region independent of p for testing the 
hypothesis n = mo against the alternative hypothesis n > m, provided < and 
m — Zarenot small. Because of the nature of the test, its optimum properties 
also apply to testing whether the data came from a binomial population with 
nm = np or from a Poisson population. 


z 


2. Introduction. A problem of considerable interest in certain fields is that 
of deciding whether a set of observations should be treated as having come from 
either a binomial population or from a Poisson population. Although there was 
much discussion a few years ago concerning the best method for making such a 
decision [1], [2], [3], no solution of the problem was presented. In this paper a 
test that possesses certain optimum properties is derived for discriminating 
between two binomial populations. This test, however, is also capable of solving 
the problem of how to discriminate between a binomial and a Poisson population. 
The methods that are employed in the derivation of this test are similar to those 
of an earlier paper [4] in which the problem of discriminating between two Poisson 
populations was studied. 


3. Similar regions. Let n denote the number of trials and p the probability 
of success in a single trial for a binomial distribution. Let 2x; , 22, --- , % repre- 
sent the observed frequencies in k random samples from this binomial population. 
Now consider the two alternative hypotheses 


Ho: n = ™%,P = Po 
and 
Hi:n 


m>Mm,Pp = fi. 


The purpose of this paper is to construct a test for discriminating between the two 
values of n regardless of the values of p; however it is convenient to begin with 
these more restrictive hypotheses 
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For the purpose of finding a critical region for testing Hy against H, , the 2; 
will be treated as the coordinates of a point ink dimensions. The probability of 
obtaining the particular point 7, --- , x, when Hp is true will be denoted by 
Py [xi]. Since the probability of obtaining x successes in n trials is given by 


n! 2 n—z 
z!(n — 2)! Pa 
it follows that 


(no!)* Ere E(no~z4) 
(1) Pox) = 4———— ro'-_— —@V 


II xi! (mo — x:)! 


In searching for a critical region that will be independent of 7p , it is illuminat- 
ing to study the methods that were designed by Neyman and Pearson [5] for 
continuous distributions. These methods suggest that one should look for criti- 

k 


cal regions on the surfaces >) x; = constant. For this reason, instead of 
using (1) for constructing critical regions, it is desirable to study the conditional 
probability distribution of the points lying in the plane > x; = N, where N isa 
positive integer not exceeding kno. The conditional suitable of obtaining 
the point x1 , --- , x, when the point is restricted to lie in the plane > z= N, 
will be denoted by Po[z;| N]. Its value may be obtained by dividing the proba- 
bility (1) by the probability that the point will lie in the plane > z= WN. If 
this latter probability is denoted by Po[N], then ; 


Polx;] 
PAN] 
Since the sum of k independent variables each possessing the same binomial dis- 


tribution has a binomial distribution with n replaced by kn, it follows that N 
possesses a binomial distribution and that 





(2) Pox; | N] = 


(3) P,[N] = Te po qo”. 


If (1) and (3) are substituted in (2), it will reduce to 
_(m!)* 2 ks Tom N)! 


Polx;| N] = 
(4) (kno)! II x! (nm — mn)! 


This conditional probability distribution in the plane > x; = N is independent 
1 


of »» and therefore may serve as the basis for constructing a critical region that 
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is independent of po for testing Hy against H,. It will therefore be possible to 
test the less restrictive hypothesis 


Hy [n=N 
against 


Hi:n=n>%M. 


4. Best critical region. Although a best critical region does not exist for 
testing Ho against Hj , it is helpful to proceed as though one did. 


k 
If a critical region of size a could be selected in each plane >, 2; = N, 
1 


(N = 0,1, --- , kno), then the totality of such critical regions would constitute 
a critical region of size a that is independent of po and which therefore could be 
used to test Ho against H;. For, if Py [X ¢ C.R.] denotes the probability that 
the sample point, which will be denoted by X, will lie in the critical region, it 
follows that 


kno 


P[X € C.R.] = a Po[N|PolX € C.R. | N] 


6) = F Pu 


n= 
= &. 

This last equality follows from the fact that the sample point must lie in one of 

the planes > zr; = N,(N = 0,1, --+ , kno). 


Furthermore, this would be the only critical region of size a independent of 
po , because if a critical region of size ay , (N = 0,1, --- , kno), were selected in 
k 


the plane >> 2; = N(N = 0,1, --+ , kno), it would be necessary that 
1 


kno 


> PilNlaw = a, 


N=0 
independent of the value of po. From (3) this is equivalent to requiring that 


kno ; 
kno)! N — 
¥ | Ne wy Pe — po)” “aw = a, 


(6) 
independent of the value of p. Since the left side of (6) is a polynomial in pp , 
its constant term must equal a and all other coefficients must vanish. It will be 
observed that no terms of the sum in (6) that arise from N > r will contribute to 
the coefficient of po ; consequently this coefficient will not contain the unknowns 
Qr41 , *** » Qing. These considerations show that the ay must satisfy equa- 
tions of the form 
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a = C0 Qa 


= Cy Go + Cy OY 


0 = Ckng0 Ao + Ckngi 1 + +++ + Ckngkno Hkng + 


It will also be observed that c,, = (kno)!/r!(kno — r)!; consequently the triangular 
matrix of the coefficients in these kno + 1 non-homogeneous equations is non- 
singular. The equations therefore possess a unique solution, namely the known 
solution of ay = a. 


The preceding discussion shows that it is necessary to find critical regions of 
k 

size a in each plane >, x; = N,(N = 0,1, --+ , kno), if a critical region indepen- 
1 


dent of po is desired. If each such planar critical region were a best critical 
region for that plane, then the totality of such regions would constitute a 
best critical region independent of po for testing Hy against Hj . 

It follows from the theory of best critical regions [5] that if a best critical region 


k 
in the plane - x; = N did exist, it would be determined by the inequality 
1 


1 r 
Poles | NI <x 
P,|z;| N] . 
where P; corresponds to Py when H, is true and where K is a constant whose value 
is chosen to make the critical region one of sizea. Now from (4), 


8) Pol; | N) _ (no!)*(kno — N)! (km) !(m — 2;)! 


Pilzi|N] (m!)*(kny — N) Meno) !T(m9 — 2)! 
In order to study the possibility of a best critical region, it is therefore neces- 
sary to study the possibility of (8) satisfying inequality (7). 


5. Approximate best critical region. Unfortunately, because the variables 2; 
are discrete, it is not possible to find critical regions of exactly size a for arbitrary 
a as required in (5). Consequently it is necessary to introduce continuous ap- 
proximating functions for discrete probability functions or to resort to other 
devices if critical regions of the type discussed in the preceding section are to 
be obtained. 

For the purpose of introducing such approximations, (8) will be written in 
the following form: 


(9) ee a, oe Se ({) " "4, (bm — NY! eo 
oe P,{x;| N) "T(mo — xi)! \k ” Tm — aa! \k ’ 
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where ¢; is independent of the variables x;. It will be observed that the ratio 
on the right is a ratio of two multinomial functions. Now the multinomial 
function 


1! 22! eee LE! 


k 
where >. x; = N, can be approximated by the multivariate normal function 
1 


ss vel 1% (ie) 


Onn )) Vn D2 » Dk 


The approximation is good provided the N-p; are large and the x; remain away 
from their extreme values. If this approximation is applied to both numerator 
and denominator of (9), to this order of approximation, 


k 2 
2, | gy xz; — N/k y] 
Polas| NI _ Hees | 1% (Te wi) 
P,[zx; | N] 1 [2r(kny — N)}e—D 


k N/k \2 
Heo] -1% (Fee we) | 
(11) . (2x(km, — N)|#— 


7 a= - a: E m — No 
~ 41 ing — N eXP | ~2 (ny — N/k)(tm — N/k) 
x — n/n |, 


Since, by hypothesis, m. > mp and m > N/k, except for the case of nm) = N/k, 
which will be considered later, it follows that 














mM — % 


a - Nae - 8 * 





k 
As a consequence, the right side of (11) will decrease in value as >> (a — N/k)® 
1 


increases in value. If (a, --- , 4) isa point lying on the sphere 
k 

(12) de (a: — N/k)* = 
1 


and if the coordinates of this point satisfy inequality (7) when approximation 

(11) is used, then all points outside this sphere will also satisfy (7) to this same 

order of approximation. A best critical planar region of size a in this approxi- 
k 


mate sense can therefore be obtained in the plane >, 2; = N by determining a 
1 





DISCRIMINATING BETWEEN BINOMIAL DISTRIBUTIONS 561 


sphere with center at (N/k, --- , N/k) such that when H. 0 is true the probability 
is a that a point lying in the plane will lie outside thissphere. Furthermore, such 
a region will be a common best critical region for all values of n; > mo because 
the preceding arguments do not require the value of n; but merely the knowledge 
that m1 > N. 

For the purpose of determining the radius of the sphere that will yield the 
desired critical region, (4) will be expressed as follows: 


N! (1\* (kno — N)! (G) 
| ae ee oe ee = 
(13) Polz: | NI % Tiz;! (;) II(m — xi)! \k ° 


where ¢2 is independent of the x;. If these multinomials are replaced by their 
multivariate normal approximations as given by (10), to this approximation 
(13) will reduce to 


Pats] = aren] 4% (“Tag ) Jeow| 32 (Y= wr | 


(14) eo — N/k? 
N 
k 








= c,e exp| —3 


” N 
(t= 5) 


where ¢; is independent of the z;. Since + x; = N here, x; may be expressed 
1 


in terms of the remaining variables; consequently (14), except for a constant 
factor, may be treated as a normal distribution in the variables 7, --+ , a1. 
If the factorials in c; are replaced by their Stirling approximations, it will be 
found that c; is the correct constant for the normal distribution. 

Since it is known [6] that —2 times the exponent in a normal distribution func- 


tion possesses a chi-square distribution, it follows that to this order of ap- 
proximation 


» (x; — N/k)’ 


1-H 
k kno 


possesses a chi-square distribution with k — 1 degrees of freedom. If Xx, isa 
value such that P[x? > x2] = a, then 


(15) 


DX (us — N/ky’ 


— 
4-2) x) ae 
k kno 


determines a sphere such that to this order of approximation the probability is 
k 


(16) 


a that a point lying in the plane >, 2; = N will lie outside the sphere. From 
1 
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the arguments following (12), it therefore follows that a common best critical 
region in this approximate sense for testing Ho against H; will consist of that 


k 
part of each plane >> x; = N, (N = 0,1, --- , kno), which lies outside the cor- 
1 


responding sphere given by (16). Since the x; are non-negative and do not 
exceed nm, the planes corresponding to N = 0 and N = kno contain a single 
point; therefore it is necessary to adopt some convention that assigns 100a per- 
cent of the samples with N = Oand N = kn, to a critical region in order to obtain 
critical regions of size a in these two cases. 

For a given set of data, the procedure to be followed then consists in caleu- 
lating the statistic 


(2; — 3) 


r ( 1 ) 
No 
k 


where # = > ai/k, and agreeing to reject the hypothesis that n = n in 
1 


favor of the alternative hypothesis that n > mo if and only if z > x%, where 
Pix’ > x2] = a for k — 1 degrees of freedom. Because of the nature of the 
approximations used in (10) and (14), this result may be expected to be accurate 
only if @ and mp — & are large. 

_ The interesting feature of this result is that the familiar binomial index of 
dispersion, z, possesses optimum properties in this approximate sense for testing 
n = NM against n > Nm. 


6. Poisson application. Since the preceding test will possess approximate 
optimum properties for n as large as desired, independent of the value of p, 
and since a Poisson distribution with parameter m can be approximated as 
closely as desired by means of a binomial distribution with np = m by allowing 
n to increase sufficiently, it follows that the test will also possess approximate 
optimum properties for deciding between a binomial distribution with n = no 
and a Poisson distribution. 


7. Estimation of n. Although the purpose of this paper has been accomplished 
in the preceding sections, it is interesting to observe the role played by the closely 
related Poisson index of dispersion in the extimation of n. 

Approximate confidence limits forn may be obtained by means of (16). 
If xj_« is a value of x” such that P[x’ > xia] = 1 — a, then, to this same order 
of approximation, the probability is 1 — 2a that 


k 


> (a; — 2) 


i) 
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If these inequalities are solved for n, the following 100(1 — 2a) percent approxi- 
mate confidence limits for n will be obtained: 


a 2 = 2 
LXa XX1i—a 
1 — —— ee ' 
- , _ Ba—a "~~, _ sq — a 
5 ee ie 3 ee 
& 

Only the lower limit here will possess optimum properties. Now it will be ob- 
served that only positive values of n will be admissible if 

z= %—- £ P 2 

we «he 

E 


whereas only negative values will be admissible if 


(2, — 2) 
a le: = Xa ° 
x 
The range of values will be infinite in each case if there is equality rather than 
inequality. If, however, 


2 D(x; — £)’ 
£4 ee « 4, 


then both positive and negative values of n over infinite ranges will be admissible. 
Since n increases as the Poisson index }(x; — Z)’/ increases until it becomes 
infinite and then increases from minus infinity through negative values, (17) 
may still be thought of as giving an interval (infinite) of values with a positive 
“lower” limit and a negative ‘‘upper” limit. Thus, the familiar Poisson index 
of dispersion plays an interesting role in determining whether a Poisson assump- 
tion is reasonable as far as admissible values of n are concerned. 

If the population is truly binomial, negative values of n must be ruled out; 
consequently a Poisson assumption becomes increasingly tenable as the Poisson 
index increases. However, experience has shown [7] that a negative binomial 
distribution is often more realistic in describing data supposedly drawn from a 
binomial or Poisson population than is the assumed distribution; consequently 
a negative binomial should be given consideration if (17) yields only negative 
values or if it yields a negative “upper” limit that is numerically small relative 
to a positive ‘lower” limit. 

It is also interesting to consider the point estimation of n. Here, it is cus- 
tomary [7] to estimate n by means of 


ki 


—_ £ye 
p — 2% = 2 


x 


Thus, a positive, infinite, or negative estimate for n will be obtained according as 
the Poisson index is less than, equal to, or greater than k. 
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BILINEAR FORMS IN NORMALLY CORRELATED VARIABLES 


By Auuen T. Craic 
University of Iowa 


1. Summary. Ifa variable xis normally distributed with mean zero, we have 
previously given a necessary and sufficient condition (see references at end of 
this paper) for the independence of two real symmetric quadratic forms in n 
independent values of that variable. This condition is that the product of the 
matrices of the forms should vanish. In the present paper, we have proved 
that the same algebraic condition is both necessary and sufficient for the inde- 
pendence of two real symmetric bilinear, or a real symmetric bilinear and 
quadratic form, in normally correlated variables. 


2. Introduction. In this paper, we determine the moment generating function 
of the joint distribution of two real symmetric bilinear forms in certain normally 
correlated variables and derive a necessary and sufficient condition for the 
independence, in the probability sense, of these forms. We further investigate 
the condition for independence, in the probability sense, of real symmetric 
bilinear and quadratic forms. 


3. The moment generating function of the distribution of real symmetric 
bilinear forms. Let the two variables x and y have a joint normal distribution 
with means zero, unit variances and correlation coefficient p. From this bi- 
variate distribution, repeated random samples of n pairs, say (21, yi), (2, Ys), 


+, (%n, Yn), are drawn. Let C = || cj || be a real symmetric matrix and write 
6 = LYcxhrjyy.. The moment generating function of the distribution of 6 
is then given by 
ai ete-@ 
o(t) = Ble] = 5 eae =F [.- ff. dy, dt, «dy; dx, , 
where 


a. 24 a? — One, 
qg= 1 — py Xu (x; + yj — 2pxjy;) 


and @ is defined above. If we subject thie 2’s and y’s to the same linear homo- 
geneous transformation with appropriately chosen orthogonal matrix L, then 
Q remains invariant and 6 becomes =, jy; where the )’s are the n real roots of 
the characteristic equation of C, that is, of |C — AJ| = 0. The integrations 
are then easily effected and we find that 

g(t) = {IT — to + 1)AJl — tp — 1)ag}, 


I 
{|I —tp+1)C|-|2—to —DC]}>, 
jr —aec — a — Pct |, 
565 
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where J is the unit matrix of order n and the vertical bars, as usual, indicate the 
determinant of the enclosed matrix. 

Next, let A = || a; || and B = || bj || be two real symmetric matrices each of 
ordern. Write 6; = DLaj.2x jy, and 6. = YLbjx.x jy, where the x’s and y’s are the 
items of the sample randomly drawn from the bivariate distribution previously 
described. The moment generating function of the joint distribution of 6, and @ 
is then given by 


g(t, &) = Efe't?: *#2%) 
= QrV/1— pi) | 


oe 
— OD 


: | ef 101 t+t262—@ dyn dz, eee dy, dx, 


where 6; , 62, and Q have the meanings previously assigned to them. If we 
pursue a line of reasoning similar to that above, we find that 


g(t, 4) =| I — 2p(tA + &B) — (1 — o*)(4A + &B)?| +. 


4. The independence of bilinear forms. It is clear that there exist positive 
numbers, say h; and hz , such that g(t, , ) exists for0 <ti<hand0O<h<h. 
It is well known that a necessary and sufficient condition for the independence 
of 6, and @ is that (4; , &) shall factor into the product g(t: , 0)¢(0, &). If then, 
we assume 0, and @ to be independent, we have essentially 


| I — 2p(hA + &B) — (1 — f(A + &B)’| 
= |I — 2p4A — (1 — p)tiA’ | -|I — 2phB — (1 — p )tB" |. 


If h denotes the smaller of h; and h,, then the factored form holds for 
0 <t,% <h, and hence for all real values of : and #. In particular it holds 
for % = t, so that 


| I — 2pt(A + B) — (1 — p)&i(A + B)’| 
= |I — 2A — (1 — p)GA?| - | I — 2B — (1 — p iB |. 


(1) 


Let 71, 72, and r < 7 + 72 denote the ranks of the matrices A, B, and A + B. 
Further let the real non-zero roots of the characteristic equations of these ma- 
trices be denoted respectively by a1 , a2, +++ ,@r, , 81, Be, °** » Bry, and y1, 2, 
+++, Ye. Then the members of the preceding equation may be written 


IT 1 — (+ Dvdll — a0 - Dvd 


and 
If [1 — ti(p + I)al[l — t(p —1)a(l I 1 —&i(p + 1IBIlL —ti(p — 1)8)) 


respectively. It is seen that the left member is a polynomial in ¢, of degree 2r 
and that the right member is a polynomial in ¢, of degree 2(r1 + 1r2). Accord- 





BILINEAR FORMS 567 


ingly, 7 = 71 + 7 and the roots 71 , --- , y: consist of the roots a, --+ , a, , 
Bi, °**» Bro. That is, if 6; and 6 are independent, then the rank of A + B 
is the sum of the ranks of A and B and the non-zero roots of the characteristic 
equation of A + B consist of those of the characteristic equation of A together 
with those of B. Further, if in (1) we put & = vt,, where v is real, we have 


| I — 2pt(A + vB) — (1 — p*)ti(A + vB)? | 
= |I — 2phA — (1 — p)tiA®|-|I — 2ptwB — (1 — p*)tv’B’ |. 


Denote the rank of A + vB by?’ and the non-zero roots of its characteristic 


equation by 6:,---, 5,.. The immediately preceding equation can then be 
written 


I lt — ale + Dadll — Alo — Dal 


= I [1 — tf(p + l)ajll — 4(p — 1)a)] II [1 — t)(p + 1)vB Jl — ti(p — 1)vBi. 


From this we infer that, apart from zero roots, the roots of the characteristic 
equation of A + vB area, --- , ar, ,UB1,°** , UB. 


If a symmetric matrix, say M(v), has elements which are real polynomials 
in the real variable v, and if the determinant 


| Mv) — Z| = (—1)" — pile) IA — prQ)] +++ DA — palo)], 


where pi(v), po(v), «++ , Pn(v) are likewise real polynomials in v, then there exists, 
for all real values of v, a real orthogonal matrix, say L(v), such that 


| priv) 0 --- QO | 
| 0 p2(v) | 
Liv)M@)L) =|) 0 : 
| 


1] 
| 


| 0 Pr(v) 
dL(v) 


Furthermore’, 7 exists for all real values of v. Since 
v 





|A + 0B — XI | = (—1)"N (0 = a) ++ (A = ny)(A — 083) # + A — OB), 


1A number of years ago, in connection with another problem, the writer sought the as- 
sistance of Professor N. H. McCoy for a proof that L(v) is differentiable at v = 0. Pro- 
fessor McCoy’s elegant demonstration of the existence of L(v) showed that each element 
of this orthogonal matrix is itself a real polynomial in v, divided by the positive square 
root of another real polynomial, which polynomial is never negative and which vanishes 
for no real value of v. Thus the derivative of L(v) exists not only for v = 0 but for all 
real values of v. The writer thanks Professor McCoy for his kind and generous assistance. 
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then A + vB belongs to the class M(v) so we have 

















I ay BS <a. 0 || 
Qr, 
0 ree vB; ‘cm 0 
(2) L'(v)(A + vB)L(v) = |\* 
0 eT vB;, eee 0 | 
0 0 || 
In particular, 
Q oe 0 | 
| 
(3) L'(0)AL(0) = ie . 
0 0) 


If we differentiate (2) with respect to v and subsequently set » = 0, we have 


0 “ea 0 


(4) 22 470) + L.@BL(O) + LA 


dL(0) _ 
dv 7 


dv 
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Since L(v) is orthogonal, then L’(v)L(v) = I. Upon differentiating both mem- 
a (( 
dl s L@) = 


hers with respect tov, and subsequently setting v = 0, it is seen that - 





—L'(0) — so that L’(0) i is a skew-symmetric matrix, say S. Further 
(5) ‘ — ~L'(0) ‘ = L'(0) = —SL'(0), 
and, by taking conjugates, 
(6) m9) _ 10) WO) 10) = 1408. 


If we multiply (5) on the right by AI.(0) and (6) on the left by L’/(0)A, we see 
that (4) may be written 








0 _ 0 
0 
8 
(7) (0) BL(0) = | || + SL(O)ALO) — L'(0)AL(O)S. 
Br 
0 
10 0 


Since S is skew-symmetric and since 1/(0)AL(0) is given by (3), then each. 
element on the principal diagonal of SL’(0)AL(O0) and L’(0)AL(0)S is zero. 
Further, since L’(0)BL(0) is symmetric, then L’(0)BL(0) takes the form 


| 0 ky hare kin || 
ky 0 | 


8; 


Br. 





kin 0 
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Because the non-zero roots of the characteristic equation of L’(0)BL(0) are p, , 
-++, 8, then the sum of all two-rowed principal minors of the determinant of 
L’(0)BL(O) must equal the sum of the products of 61, +--+ ,8,. taken two ata 
time. That is 

> BiB; = Do BiB; — 2 Ki;, 

t<j t<Jj 
so that each k,; , being real, is zero. Accordingly, SL’(0)AL(0) — L’(0)AL(0)S 
is a zero matrix and L’(0)BL(0) is given by the first term in the right member 
of (7). We then have 


L'(0)AL(0)L’(0)BL(0) = L'(0)ABL(0) = 0, 


from which it follows that AB = 0. Thus, if the real symmetric bilinear forms 
6; and 62 are independent in the probability sense, the product of their matrices 
is zero. 


If, conversely, AB = 0, then 

g(t, b) = |I — 2p(hA + &B) — (1 — p)(GA® + GB) | 4, 
| [7 — 2phA — (1 — p*){A°[T — 2phB — (1 — p')@B%] |, 
ot , 0)¢(0, &), 


and 6; and 62 are independent. ‘This establishes the following theorem. 

THeoreM I. Let x and y be normally correlated with means zero, unit variances, 
and correlation coefficient p. Let 0; and @2 be two real symmetric bilinear forms in n 
random pairs of values of x and y, say (21 , yi), (42, Y2), ++ (Xn, Yn). A necessary 
and sufficient condition that 6, and 62 be independent in the probability sense ts that 
the product of the matrices of the forms be zero. 


5. Simultaneous reduction of quadratic or bilinear forms. Theargument 
of Section 4 may be used to establish in a very simple manner the following 
theorem. 

TuHeoreM II. Let A and B be two real symmetric matrices with constant ele- 
ments, each matrix of ordern. A necessary and sufficient condition that there exist 
a real orthogonal matrix of order n such that simultaneously each of L'AL and L'BL 
ts in canonical form, wherein no non-zero elements occupy corresponding positions 
on the principal diagonals, is that AB = 0. 

For if such an orthogonal matrix L exists, it is evident that L’ALL’BL = 
L’ABL = 0 from which it follows that AB = 0. Conversely, if AB = 0, then v 
being a real scalar, the matrix (A — AJ)(vB — XJ) is equal to the matrix 
—\{(A + eB) — AI]. These matrices being equal, their determinants are 
equal so that A + vB belongs to the class M(v) of section 4. Thus L may be 
taken as L(0) and simultaneously L’AZ and L’BL are of the form stated in the 
theorem. 


6. Independence of bilinaer and quadratic forms. Let 6 = 2La;,x jy, be a 
real symmetric bilinear form of rank 7 in the previously defined variables 
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(a1, 41), °°* 5 (tn, Yn) and let q = TSEb,-x 2; be a real symmetric quadratic form 
of rank rz in ay ,ae,++-,a,. As usual, denote the non-zero roots of the charac- 
teristic equations of A and B by a; , a2 , «++ ,a;, and Bi , Be, ++ , By, respectively. 
The moment mavens function of the joint distribution of’ and qis 


me VF a wf vee [et dys dag ++ dyn ds, 


where, as previously, 


a 1 2 2 
Q = 1 — Z(aj + yj — 2px;y;). 
We first orthogonally transform the variables so that the exponent in the inte- 
grand becomes, upon writing || fj || = L’/BL, 
1 


h2a;xy; + b2Zferin, — 2(2j + yj — 2px;y}). 


2(1 — p?*) 
We then integrate on yi, y2,°** , yx and obtain for the exponent in the inte- 
grand 


1— 2/7 


. r- — 42 
to DD fix Xj Ly = $22; + ply 2a; 2X; + PB xate i: 





If we effect on the variables x; , 72, «++ , %» the inverse of the orthogonal trans- 
formation initially used on the x’s and y’s, the exponent in the integrand becomes, 
using || gj || = A’, 
3 1y,2 > _- p 25 
lo TT ji, Uj U_E 922; + ply 2Za x. UjXy + — ty LLG jk Xj Ty 

or 

— $22[5;. — 2phag — (1 — p*)tign — WZbplesr , 
where 6, equals 1 or 0 according as j does or does not equal k. Hence, 
(8) o(t, &) = | I — 2phA — (1 — pA? — 26B |. 

If @ and gq are independent, we have 
(9) |I —2phA — (1 — p)tiA? — 2B | 
= |I — 2A — (1 — p tA’ |-|I — 26B |, 


for 0 < t; < h,and0 < ft <h,. As before, the members of (9) are polynomials 
which, being equal for 0 < tf, t < h, are equal for all real values of t and # . 
If we put t; = 1 and t, = vt; = v, where v is real, then (9) becomes 


|T —2pA — (1 — p)A® — 2B| = |I — 2pA — (1 — p)A*|- | I — QB] 


= IT tt - @ — Naallt - (+ Ded La — 208). 
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That is, Ww 
|2pA + (1 — pA? + 2B - XA | 
= (—1)"7# 1) — 2p — (1 — p ail [IN — 2pa,, — (1 — p )a’r| 
“[A - 2vB]-+-[A — 2v8,,] 
so that 2oA + (1 — p’)A? + 2vB is a matrix of the class M(v). Hence we write 


ar 
2pa, + (1 — pai tee 0 
= * 
oe oe | Si 
i 4% we 
| 2pa,, + (1 = p Jar, 
me mh i| 
cal (1 
L/(v)[2pA + (1 — p)A® + QWwBIL(v) = | is 
| . vB ry 
0 } 
|: | 
nt) 0 


The argument of section 4 shows that L’(0)[2pA + (1 — p’)A*|L(0)L’(0)2BL(0) 
is a zero matrix, from which it follows that 2o4B + (1 — p’) A7B = 0. But 
this imposes on p, n’ conditions of the form 


Qpl jx. + (1 — pm =0, (j,k = 1,2,-++,n). 


Since these hold for every —1 < p < 1, they hold identically. Hence each 1;. 
and m;, is zero. In particular, || 7; || = AB = 0 if @ and q are independent. 
Conversely, if AB = 0, we see by Theorem II that (8) becomes 


o(ti , t:) = (tr , 0)e(0, t), 


so that @ and qg are independent. This yields Theorem III. 

TuHroreM III. Let x and y be normally correlated with means zero, unit vari- 
ances, and correlation coefficient p. Let 6 be a real symmetric bilinear form in the 
n random pairs of values of x and y, say (a1, yi), +++ , (Xn, Yn), and let q be a real 
symmetric quadratic form in 2%, %2,°*+ , Xn (Or Yi, +++, Yn). A necessary and 
sufficient condition that @ and q be independent in the probability sense is that the 
product of the matrices of the forms be zero. 

For example, let @ be times the sample covariance and let q be n times the 
square of the mean of the 2’s. Then 





= X(1; — £)(y; — 9) 


= LLajpx yr y 
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where 


— otherwise, 
n 


and 


g= n= = Dr jer fe ; by = 1/n forj,k = 1,2,--+,n. 


Since AB = 0, then 6 and q are independent, a fact otherwise known but perhaps 
not so easily established. 
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ON THE CHARLIER TYPE B SERIES 


By S. KuLuBack 


George Washington University 


1. Introduction. The Type B series of Charlier has been discussed in some 
detail in the literature (See references at the end of the paper). The problem 
of the convergence of the Type B series has been considered by Pollaczek- 
Geiringer [12], [13], Szeg6 [12] (page 110), Uspensky [16], Jacob [5], Schmidt [16] 
and Obrechkoff [11]. There is presented in the following a method of develop- 
ment of the Type B series which is believed to be of some interest, including a 
necessary and sufficient condition for the convergence which is basically the 
same as that of Schmidt [16]. A result of Steffensen [17] is extended and shown 
to be related to the Charlier Type B series. 


2. Statement of results. Consider the function p(r), defined for r = 0, 1, 2 
- , and such that 


’ 


(2.1) d p(r) = 1; 2d |p(r)| = A 
where A is some finite value. Let the n-th factorial moment be defined by 


Ko) = 1 


(2.2) ES 
Ka) = de r(r — 1)\(r — 2): --r?7 —n+ 1)p(r), (n = 1, 2,---) 


For arbitrary ) let 


n(n — 1) 





Ln = Bony — Nea—y A + - 91 — M(n—2) ” 
aed 1 2 
_ Wn ——- bony bees  (—1"2", 


We prove the following results: 
THEOREM. A necessary and sufficient condition that the function p(r) of (2.1) 
may be expressed by the absolutely convergent series 





—hia yr 2 —-Ayr 
n(n) = 2» de xX In 0 eX 
- pr) = — +ha sr tao a 
as that 
1 
(2.5) 1+ law] + lem | + ti la@ | ee + Lom | 


converges where L,, is defined as in (2.3). 
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3. Generating functions. For the function p(r) of (2.1) consider the gen- 
erating function defined by 


(3.1) e@) = 2 2 p(r) 
where z is a complex variable. Because of (2.1) it is clear that the right member 
of (3.1) is uniformly and absolutely convergent for |z| < 1 so that the radius 


of convergence of (3.1) is some value R, > 1. 
The Taylor expansion of ¢(z) about the point z = 1 is given by 


(3.2) o(e) = ofl) + @— Dea) + FSU pray +. 
where, as may be readily obtained from (3.1), 
33) (1) = Derr — 1) = 2)--- — m+ Vp) = wm 


If it is assumed that (2.5) converges, then 





2 
(3.4) (2) = 1+ @ — lem + @ ma May +: 4% ci Bay +: 


is uniformly and absolutely convergent for |z—1 | <1 


4, Sufficiency. For arbitrary \ let us set 


_ 42 
crs (1 + wa (z — 1) + pre oot _ +) 


(4.1) 


In 


=1+h@-1) +5 @ — 1% +: 


where the right member, because of (3.4) is absolutely convergent for 


|z-—1| 21. The coefficients on the right side of (4.1) are given by 


ea 1 n n 
(4.2) Ln = Mn) — Np(n—1) A + a M(n—2) ° — cere “+ (—1) Xr 


and the factorial moments may also be expressed by 


n(n 


(4.3) a ee a =” nail tices tae 


These relations are readily derived by expressing (4.1) symbolically as 
(4.4) eh e-D + ule-D _ chev 


where after expansion »” and L” are to be replaced by yin) and L, respectively. 
(Cf. Jordan [7], p. 39). From (4.1) and (3.4) there is now derived 


(4.5) oe) = oe (1 +h -1) +2 @-1 + ), 
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Since the right member of (4.5) is absolutely and uniformly convergent for 


|2 — 1| < 1 for arbitrary \, it may be expressed as 
0 L.£ \(z—1) 
6 = b ion ne . 
(4.6) ¢(z) G+u3 + 5 ane ). 


Since the radius of convergence of the right member of (4.6) is some value R, 
such that |z —1| < R, > 1, it may be expressed as a power series about z = 0, or 


- Lf nN’ 2" 
(4.7) o@) =(14 L542 2 4--)e (tae 4 ; +: -) 


Recalling now the definition of ¢(z) as given in (3.1), there is obtained by equat- 
ing coefficients of like powers of z in (3.1) and (4.7) 


(4.8) wry= (14,2 +e 5 cease 
oe “ox 21 an rt 
Since it may be readily shown that 
3” e> a dl r rN 
(4.9) an or! as - r! 
where 
at Xe X_N 
r! r! (r — 1)! 
and 
—Ayr —A \r —r\ ,r—-1 
n€@ A n—1€ A n—1 @ d 
Qreaa. ee om 
r! . r! (r—1)! 
we may also write (4.8) as 
—rAyr —Ayr Ayr —Aar 
. (ry = o* — pat 4 Bye ® _ Bye dh. 
Pe ee tS a a nt 


5. Necessity. Assume that the function p(r) of (2.1), for arbitrary \, is 
given by the absolutely nn series 


Lp ¥ coat 
(5.1) p(r) = (1 + ns + a! ane + om -) “a . 


° er ° ° a ° 
Since e “\'/r! is continuous with respect to A, there follows, where z is a complex 
variable and |z| < 1 





—. * zen Le # Age 
2? = 2 * bs rl + a any Fi “ae ae 
(5.2) = a + L(z-1) + oe — a --) 

1+ Me — 1) + SP (2 — 1)’ +o 2-1) +-:- 
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where 

(5.3) My = Ln + nEnrd + = )) a ee ss 

From (5.2) it follows that 

(5.4) M,, = Kn) 

where pn) is as defined in (3.3). Since (5.1) becomes for r = 0, = 0 
1 1 

(5.5) 1 — pw + 9) #@ — 3, #® - ++ 


the assumed absolute convergence implies that 


1 1 1 
(5.6) 1+ |u| + 9) | He i+ 3, | ea | + nee + 7 | Hom | + oe 
converges. 


6. Remarks. Obrechkoff [11] shows that his result includes those of Pollaczek- 
Geiringer [12], Szeg6é [12] (p. 110) and Jacob [5]. His theorem states that if 
the function p(r), (r = 0, 1, 2, ---), satisfies the following conditions 


(6.1) D2! | p(r) 


is convergent for each finite number A, and 


(4\)" | p(r) | a a 
(6.2) eT ie (e*N /r!) 


tends toward zero as n increases indefinitely then p(r) may be expressed in a 
convergent Charlier Type B series. 
Uspensky [18] shows that if 


eo 


(6.3) x 2’ p(r) 


has a radius of convergence 2} > 2 then p(r) may be expressed in a convergent 
Charlier Type B series. 

Schmidt [16] shows that a necessary and sufficient condition for the convergence 
is that the function g(z) defined as in (3.1) (he does not explicitly impose the 
condition (2.1) on p(r)) be regular inside the two circles | z | < land|z—1|<1 
and with all its derivatives is continuous on the peripheries also. In the case 
that p(r) = 0, the condition (2.5) is stronger, in fact in this case Schmidt [16] 
shows that a necessary and sufficient condition is that 

lim p(r)2’r* = 0 


T= 
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for all integral k = 0. If p(r) 2 0, then Uspensky’s condition is only just 
enough stronger than Schmidt’s to keep it from being sufficient. 

If (6.1) is satisfied, or if (6.3) is satisfied then (3.1) is absolutely convergent 
for|z| <2. Therefore, the point z = 2 is contained in the circle of convergence 
of (3.2) or (3.4) which implies that 

1 1 
1+ | wary | + 9, | He | + ea nil Bom | + nee 
converges. 

It is deemed worthy of special mention to point out, as both Schmidt and 
Uspensky have done, the striking fact that the necessary and sufficient condition 
for the validity of (2.4) is independent of A. This arbitrariness of \ enables us 
to dispose of it so as to obtain better convergence. Indeed if we set \ = py 
then as is evident from (4.2) Z, = 0. 


7. Special cases. It is of interest to note that (4.8) is the Taylor expansion 
if p(r) = e *y’/r!, (r = 0, 1, 2, --- ), for then (4.2) becomes 


(7.1) L, = (u — »)" 
since for the Poisson Exponential Distribution e“y'/r!, (r = 0, 1, 2, ---), 
Hin) = mw” and (4.8) is then 
—p fT —A yr der, a Be a war 
(7.2) eu _@ +i- oer 4 (u — dA) oO ed 








r! r! ar r! 2! an r! 


If p(r) is finite, that is if p(r) = Oforr > n+ 1 thenyy, = Ofork > n+ 1. 
Thus, for a finite function the condition (2.5) is satisfied. 


8. Factorial moments. For functions p(r), (r = 0,1, 2, ---), satisfying (2.5), 
there may be derived from (3.1) and (3.4) the relation 


x Mins +---, ( =0,1,2, -->), 
since each side is y(0) derived respectively from (3.1) and (3.4). It should 
be noted that for \ = 0 (4.5) leads to (8.1) rather than (4.8) so that (8.1) may 
be considered as the Charlier Type B series for \ = 0. The result (8.1) was 
derived for finite functions by Steffensen [17]. (Also compare Kaplansky [8)). 
This may also be expressed symbolically by 


(8.2) p(r) = we “/r!, (r = 0,1, 2,---), 
where after expansion yu” is to be replaced by win). It is of interest to note the 


relation between the symbolic expression for p(r) as a Poisson Exponential in 
(8.2) and the series (4.8), for (4.8) may be expressed symbolically as 


1 
(8.1) rip(r) = we — wean + gi Mert) = 


L(Q/dd) ee Mi 
o(r) = e° o eee 
(8.3) ae r! 
pe "/r! 
since e*“/*” f(x) = f(x + a) and the relations (4.2), (4.3), (4.4). 


= 6 OD (1) 4 1) ir! 
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CHARLIER TYPE B SERIES 


9. Illustrations. Consider the function 
(9.1) p(r) = 1/2", (r = 0, 1,2, -++). 


For this function 


(9.2) g(z) = dz pr) = 1/(2 — 2) 
and 
(9.3) (1) = wa = n! 


so that (2.5) becomes 


(9.4) + 424 


which does not converge. (It may be of interest to note that for this case 
(8.1) yields 


(9.5) GQ = 1-141<+14 1 <— +, 


The series on the right in (9.5) is not convergent but is summable C; to 3. For 
the latter see for example R. P. Agnew, [19].) In this case the first several co- 
efficients of (4.8) are for \ = 1, 


_ Ln am Ls Li ~~ 
Ly = 0, 2! = .5000, 31 = 3330, 4! = .3750 
(9.6) . ; 
=, = -3667, *— 3681, % = 3679, 
5! 6! i! 
Let us now consider the function 
(9.7) p(0O) = 3, pir) = 4’, (r = 1,2, ---), 
For this function 
(9.8) g(z) = doz" p(r) = 1 4 sia 
r—0 3-—z 
and 
(n) aes ei n! 3 - 
(9.9) ¢ (1) = wm = ot (n = 1,2,-+-), 


so that (2.5) becomes 


(9.10) 
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which converges. For this case (8.1) yields 
3\1 3\ 1 3\ 1 
(2+: -Os* oS 
3\1 3\ 1 3!/3\ 1 
ee a er ww Kee em 
(@)3 21(3) * a (3) 28 
etc. 


In this case, the first several coefficients of (4.8) are for \ = 0.75 


p(0) 


vi- 


(9.11) 
p(1) 


oly 


as Ln _ Ls _ I, _ 
L, = 0, _* .093750, = 046875, a. 019043 
(9.12) . 
= = .010840, “4s _ 005173,  % = .002622, 
5! 6! 7! 
Let us now consider the function (suggested by Prof. C. Wexler) 
_ 9 o(r) = (~1)72(2Y) s ta 
(9.13) p(0) = 3, p(r) = ( 1) (2). (a — 1, 2, ). 
For this function 
(9.14) > p(r) = l, ZZ. | p(r) | = 5 
r=0 r=0 
(9.15) e() = ez’ p(r) = 5/8 + 2) 
(9.16) g” (1) = wm = (— 1)"n! (2/5)”. 


In this case (2.5) becomes 


2 2\* 2\° 
(9.17) 1+2+(2) + (?) de ass 


which converges and (8.1) yields 
2 2\’ 2\' 
pO) = 1 +s +(?) + (2) + --- = 5/3 


(9.18) ‘ 
p(l) = —2/5 — 21(2/5)" — 5 (2/5!) — +. = — 


Col or 
Col bo 


ete. 
Note that for this case (6.1) or (6.3) are not satisfied. Using \ = 1, it is 
found that 


S 
SS 


9.19) l= -14, f= 1.06, = = —.5906, 


_= 2779, ++. 


> 
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NOTES 


This section is devoted to brief research expository articles on methodology 
and other short items. 
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ON SMALL-SAMPLE ESTIMATION 


By Greorce W. Brown 
Iowa State College 


1. Summary. This paper discusses some of the concepts underlying small 
sample estimation and reexamines, in particular, the current notions on ‘‘un- 
biased” estimation. Alternatives to the usual unbiased property are examined 
with respect to invariance under simultaneous one-to-one transformation of 
parameter and estimate; one of these alternatives, closely related to the maxi- 
mum likelihood method, seems to be new. The property of being unbiased in 
the likelihood sense is essentially equivalent to the statement that the estimate 
is a maximum likelihood estimate based on some distribution derived by inte- 
gration from the original sampling distribution, by virtue of a “hereditary” 
property of maximum likelihood estimation. 

An exposition of maximum likelihood estimation is given in terms of optimum 
pairwise selection with equal weights, providing a type of rationale for small 
sample estimation by maximum likelihood. 


2. Introduction. In large sample theory of estimation the problems are 
generally formulated in terms of a random variable + = (a, 42 ,--++ , Un) anda 
product distribution with, say, a density g(x|6) = f(x:|0)f(x2|0) --- f(an|6) 
where n is permitted to increase without limit. For small sample theory it is 
sufficient to consider an arbitrary distribution, not necessarily of product form, 
depending on a parameter 6. For convenience we will assume a distribution 
density of fixed form g(x|@), where x is in Euclidean n-space and @ in Euclidean 
k-space,k <n. Granting at the outset that a complete rationale for estimation 
must be based on considerations like those of Wald [4, 1939] dealing with specified 
risk functions, it is still a difficult process, in practice, to specify the risk functions 
and solve the ensuing mathematics problems. It may still be to the point, then, 
to consider general properties that estimates might be required to have in order 
to be considered ‘‘acceptable’”’, or perhaps even “optimum ”’, over a class of 
“acceptable” estimates. 

In large-sample theory the situation is fairly simple. Consistent estimates 
have the property that the estimate converges in probability to the true param- 
eter value. ‘Best’ or “optimum” estimates are defined in terms of the order 
of convergence, or asymptotic variance. All reasonable definitions of “‘optimum”’ 
become asymptotically equivalent, since they all measure essentially the rate of 
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convergence, so that one might ask for least variance, or least expected absolute 
deviation, or least expected kth power, without affecting the optimum estimate, 
in general. Moreover, the consistency property and the optimum properties 
are in general invariant under simultaneous one-to-one transformation of the 
parameter and its estimate, i.e., the square of an asymptotically optimum esti- 
mate of o will be an asymptotically optimum estimate of o”. Finally, a general 
estimation method, the method of maximum likelihood, leads to optimum esti- 
mates in large samples. 

In small samples, on the other hand, the search for corresponding criteria has 
led to the investigation of best “unbiased” estimates, and the like, where few, 
if any, of the definitions discussed possess an invariance property under simul- 
taneous one-to-one transformation of the parameter and its estimate. 


3. Unbiased estimation. To ensure, in small-sample estimation, that an 
estimate bears some relation to the parameter it is estimating, it has become the 
custom to require that an estimate be unbiased, which means that the expected 
value of the estimate agrees with the parameter value. This condition was sug- 
gested by the consistency property which is required in large-sample estimation. 
It ensures, moreover, that the average of a large number of independent estimates 
made on the same basis will provide a consistent estimate, in the large sample 
sense. While this consistency property of the average may at times be conveni- 
ent in practical situations, the fact remains that the problem of estimation from 
a number of such observations is a different estimation problem, the ‘‘best”’ 
solution to which need not be the average of the “‘best’’ solutions of the original 
problem corresponding to estimation of 6 from a single observation on z, where 
x has a density g(x|6). More to the point, however, is the objection that an 
unbiased estimate of a parameter does not in general transform into an unbiased 
estimate when both estimate and parameter are subjected to the same one-to-one 
transformation. Moreover, one can easily construct situations for which the 
only acceptable unbiased estimates are clearly inferior from almost any point 
of view, to estimates which are biased (Girshick, Mosteller and Savage, [1, 1946], 
and Halmos [2, 1946)). 

It may be of interest to consider a few reasonable alternatives to the lack of 
bias requirement, which seem to accomplish as much as the conventional defini- 
tion and which, in addition, have an invariance under one-to-one transformation 
of the parameter and estimate. To avoid confusion, let us attach the qualifying 
prefix “‘mean”’ to the usual unbiased property, so that an estimate will be said 
to be mean-unbiased if its expected value agrees with the parameter value. 

Consider as one alternative the following property. An estimate of a one- 
dimensional parameter @ will be said to be median-unbiased, if for fixed @, the 
median of the distribution of the estimate is at the value @, i.e., the estimate 
underestimates just as often as it overestimates. This requirement seems for 
most purposes to accomplish as much as the mean-unbiased requirement and 
has the additional property that it is invariant under one-to-one transformation. 
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A different alternative requirement which is invariant under transformations 
is suggested by the definition of unbiased tests of significance (Neyman and 
Pearson [3, 1936]). Let us say that an estimate is lékelihood-unbiased if h(6|6’) < 
h(0 | @), where the estimate 6 has probability density h(6| 0). In other words, an 
estimation method is likelihood-unbiased if estimates in the neighborhood of a 
given parameter value @ would occur more frequently when the true value is 
itself @ than when it differs from @. On intuitive grounds this seems to be an 
acceptable kind of requirement, applicable to a very general class of estimation 
problems. It is evident that the assumption of a density plays no important 
role here; the situation is analogous to the maximum likelihood situation. The 
property itself is invariant under simultaneous one-to-one transformations of 
parameter and estimate for the same reason that maximum likelihood estimates 
are invariant under such transformations, in fact one can readily see that the 
likelihood-unbiased condition is equivalent to requiring that 6 have such a 
distribution, as a function of 6, that the maximum likelihood estimate of 9 
based on 6 will be actually equal to 6. The obvious implication of this fact is 
that if a function ¢(x) is given (possibly a sufficient statistic for @) then there is 
an essentially unique likelihood-unbiased estimate 6 based on ¢, obtained by 
finding the maximum likelihood estimate of 6 in the distribution of ¢ as a function 
of 0. 

As an example, consider the estimation of o” from a sample of n observations 
from a normal distribution. Let S’ be the usual sum of squares, where S’/o’ 
is distributed like x* on n — 1 degrees of freedom. Then the only likelihood— 
unbiased estimate of o° based on S’ is S*/(n — 1). In this case S’/(n — 1) is 
also mean-unbiased, a fact which is normally quoted as justification for the 
division by ~ — 1. Curiously enough, it is customary to estimate o by 
+/ S?/(n — 1), even though this is a biased estimate of o, according to the usual 
notion of ‘‘unbiased’’, referred to here as ‘‘mean-unbiased”’. On the other hand, 
~/S?/(n — 1) is a perfectly good likelihood-unbiased estimate of o, by virtue 
of the invariance under transformations. It might be pointed out, in passing, 
that the estimate S?/(n — 1) does not have minimum mean square about o’, 
but that the optimum divisor for minimizing the mean square error about o 
isn + 1. 

The fact that a likelihood-unbiased estimate is the maximum likelihood esti- 
mate based on the distribution of the estimate itself suggest further examination 
of maximum likelihood estimates. If we define a simple estimate as one which 
completely determines a probability distribution for x, then we have as a theorem, 
the following: 

A simple maximum likelihood estimate 6(x) is likelihood-unbiased. What this 
means is essentially that maximum-likelihood is “‘hereditary’’, i.e. if 6(a) maxi- 
mizes g(x | 6) in a space of n dimensions, and 6 has a derived density h(6 | @) 
in a space of k < n dimensions, then 6 = 6 maximizes h(6| @). The proof follows 
readily from the fact that h(6| 0) is obtained by integration of g(z | @) over all 
x such that 6(x) = 6. 
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The example of estimating o’, quoted above, shows that the word “simple” 
cannot be omitted from the statement above. For example, the simple estimate 
in the parent distribution is the joint estimate (x, S*/n) of (m, o*) and in fact the 
joint estimate is likelihood-unbiased. On the other hand, S’/n is not a simple 
maximum likelihood estimate, and we observe that S?/n is not likelihood-un- 
biased. S’*/(n — 1) is a simple maximum likelihood estimate of o° based on 
the distribution of S? itself, so that S’/(n — 1) is, as a result, likelihood unbiased. 

One can exhibit situations in which the conventional mean-unbiased property 
is very unnatural, while the likelihood-unbiased property may be quite natural. 
Consider, for example, the case where o° is to be estimated by use of a x’-dis- 
tributed S’ with n — 1 degrees of freedom, but subject to the condition o? > 03 , 
where o is known in advance. Then the estimate o? = max [S*/(n — 1), o@] is 
certainly biased according to conventional definitions, but is nevertheless, likeli- 
hood unbiased. To get a mean-unbiased estimate when o* is near to o is im- 
possible except by admitting estimates less than o¢ , which is clearly foolish if it is 
known that o” > o3. 

It may be of interest to include a brief discussion of maximum likelihood esti- 
mation in terms of pairwise selection of alternatives, providing a sort of optimum 
property for maximum likelihood estimation in small samples, in addition to the 
likelihood-unbiased property. Consider a choice to be made between only two 
alternative values of 0, say % and 6,, by dividing the sample space into two 
regions Sp and S; , such that 4 is accepted when z falls in So and @; is accepted 
when z falls in S,. Then 


Po.(So) + Poo(S1) = Po,(So) + Po,(Si) = 1. 


P,(So) is the probability of making the error of accepting % when @ = 6, and 
1 — P¢,(So) is the probability of making the error of accepting 6, when 6 = 6. 
If the two errors are weighted equally, it is evident that a “best” test will choose 
Sy so as to minimize P»,(S 9) + 1 — Pe,(So). It is well known that Sp» will 
minimize the indicated quantity if Sp consists of all points x such that g(x | ) > 
g(x |@:). Thus we may speak of the region Sp defined by g(x | %) > g(x | 1) 
as an optimum equal risk acceptance region for 0) against 6,. Now if we transfer 
our attention to the general estimation problem we see that the maximum 
likelihood estimate 6(x) is that value of @ which would be accepted by the op- 
timum equal risk acceptance procedure against all other 6’s. 
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A NOTE ON REGRESSION ANALYSIS 


By ABRAHAM WALD 


Columbia University 


1. Introduction. In regression analysis a set of variables y, %1,°-:, 2» 
is considered where y is called the dependent variable and 2, --- , Xp» are the 
independent variables. Let y. denote the ath observation on y and xia the 
ath observation on x;, (i = 1,---,p;a=1,---,N). The observations 2;, 
are treated as given constants, while the observations y; , --- , yw are regarded 
as chance variables. The following two assumptions are usually made concern- 
ing the joint distribution of the variates y,,---, yw: 

(a) The variates y: , --- , yw are normally and independently distributed with 
a common unknown variance o. 

(b) The expected value of y« is equal to Bitie + +++: + Bptpa Where Bi, --- 
8, are unknown constants. 

In some problems it seems reasonable to assume that the regression coefficients 
Bi, °°: , Bp are not constants, but chance variables. This leads to a different 
probability model for regression analysis and the object of this note is to discuss 
certain aspects of this model. In what follows in this note we shall make the 
following assumptions concerning the joint distribution of the chance variables 
Yi,°**, Yn; Bi,-->, Bp. 

Assumption 1. For given values of 8, ,--+- , 6B» the joint conditional prob- 
ability density function of y:,--- , yw is given by 


’ 


1 le 
(1.1) (2n)" 75% exp | 33 x Wa — Pitta —*** — By ra) | 
Assumption 2. The regression coefficients 6:,---, Bp are independently 
distributed. 
Assumption 8. The regression coefficients 6; , --- , B,, (r < p), are normally 


. - “ a 2 
distributed with zero means and a common variance o’’. 
72 


The purpose of this note is to derive confidence limits for the ratio — . Such 
o 


confidence limits have been derived by the author [1] for analysis of variance 
problems assuming that there are only main effects but no interactions. The 
regression problem treated in the present note is much more general and in- 
cludes all the analysis of variance problems with or without interactions as 
special cases. 

It should be remarked that Assumptions 2 and 3 do not exclude the case where 
Brat, °***, Bp are constants. 

92 
2. Derivation of confidence limits for the ratio =. Let bi, --- , bp be the 


sample estimates of 61, --- , 8» obtained by the method of least squares. We 





|e 
l- 
uS 
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shall denote the difference b; — 8; by «;, (¢ = 1, ---,p). It is known that for 
given values of 6,,--- , Bp the conditional joint distribution of ¢,,--- , €p is 
normal with zero means and variance-covariance matrix || ¢;; || o° where 


(2.1) Il ess | = |l es I 
and 
N 

(2.2) a; = Di Viatia (i,j =1, +++, p). 
Since the conditional distribution of « , --- , €p does not depend on the values 
of 6:1, °-: , Bp, the unconditional distribution of « , --- , €p is the same as the 
conditional one, and the set of variates (6, ---, 8p) is independently dis- 
tributed of the set (€:, --- , €,). From this and Assumptions 2 and 3 it follows 
that bi, --: , 6, have a joint normal distribution and that 
(2.3) Eb; = 0, (@ = 1,---,7r) 
and 

o”* 
(2.4) Eb;b; = (ca + 6; “) e. (2,7 = ],--. ; r) 


where 6;; = O fort ¥ j and = 1 forz = j. 
72 ’ 
We shall denote 5 by A and the elements of the inverse of || ¢:; + 6;;A || by 
d;(d), 1.€., 


(2.5) II dss(d) |] = [Lees + 85d |], (i,j =1,+++,7). 
Then the quadratic form 
1 r r 
(2.6) QA) = 2D, dul)bid 
j=l i= 
has the x’ distribution with r degrees of freedom. 
It is known that for any given values of 8B, , --- , Bp, b1, «++ , bp the quadratic 
form 
i< 2 
(2.7) Qa = 2 2» (Ya = bite ee a ee be Spa) 


has the x’ distribution with N — p degrees of freedom provided that the rank 
of the matrix || ri. || is p. Hence Q, and Q(A) are independently distributed 
and the ratio 
,_ N — pQQ) 
2. F = eX“) 
(2.8) > 
has the F-distribution with r and N — p degrees of freedom. 
Let F; and F, be two values chosen so that 


(2.9) Prob. {Fy < F < F,} = C 
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where c is a given positive constant less than 1. Then the set of all values 
for which the inequality 


(2.10) Fi< oom QO) < p, 
holds forms a confidence set for \ with the confidence coefficient c. 

We shall now show that Q(A) is a monotonic function of \ and, therefore, 
the confidence set determined by (2.10) is an interval. Let || gi; ||, (¢, 7 = 


1, ---, 7), be an orthogonal matrix and let 
(2.11) bs = Do guibj. 
Foul 

It then follows from (2.3) and (2.4) that 
(2.12) E(b;) = 0, G@ =1,-+-,7) 
and 
(2.13) E(bgb;) = (ci; + dijA)o’, G,j =1,-+-,7) 
where 
(2.14) 3; - 2d a Jk Git\Cre'- 

Let 
(2.15) II 2550) |] = lees + Sa 1, G,j =1,-+-,7) 
and put 


Q*(A) = * 22 dé. (not b. 


It is easy to verify that Q*(A) is identically equal to Q(A). Hence, to prove 
the monotonicity of Q(A), it is sufficient to show that Q*(A) is a monotonic func- 
tion of \. Since no restrictions as to the choice of the orthogonal matrix || g;; || 
are made, we shall choose it so that the matrix || ¢; ; || becomes diagonal, i.e., 
cr, = Ofori ¥ j, (i,j = 1,---,7). Then 





(2.16) dQ) =0 fori ¥ j 
and 
(2.17) d3;(A) = ean. 
Hence 
(2.18) ~<-ow- le > 
o7 fa Cig + A 


is a monotonically decreasing function of A. The confidence set determined by 
(2.10) is, therefore, an interval. 





\w 
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The upper end point of the confidence interval is the root in \ of the equation 


N — PQ _ p 

(2.19) oa Fy 

and the lower end point is the root in \ of the equation 
N—pQQ)_, 

(2.20) —_— ee — F, 


If equation (2.20) has no root, the lower end point of the confidence interval 
is put equal to zero. 
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ON THE SHAPE OF THE ANGULAR CASE OF CAUCHY’S 
DISTRIBUTION CURVES 


By AuREL WINTNER 


The Johns Hopkins University 


1. Let ¢ be a linear random variable, that is, a random variable capable of 
values x represented by points of a line —« <x <o, and suppose, for sim- 
plicity, that has a density of probability, f(z). Then, subject to provisos of 
convergence, the series 


0 


F(z) = DO fe +n) 
represents a periodic function, of period 1, having the following significance: 
F(x) is the density of probability of the angular random variable, say =, which 
is obtained if all the states 


"4 ¢ — 3, g = i, é, € + I, —+ 2,--: 


of the linear random variable are identified. 

In other words, if a circle of unit circumference rolls from — ~ to © on the 
g-line, then every point of the circumference collects the various densities of 
probability attached to congruent points of the é-line, and a state of = repre- 
sents a point of the circumference. For a detailed study of the mapping § — = 
or f — F, cf. [2]. 

According to Poisson’s summation formula, the Fourier constants of the 
periodic function F(x) can be obtained by restricting u in g(u) to an equidistant 
sequence of discrete values, where g(u) denotes the Fourier transform of f(x); 
cf., e.g., [5], p. 78 or [9], pp. 477-478. 
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2. ‘Consider, in particular, the case in which f(x) is the density of a symmetric 
distribution which is stable in Cauchy’s sense. The determination of the totality 
of these linear densities of probability is due to Lévy [6]. It was shown in [8] 
that every such f(x) = f(—2) is a decreasing function of |x|. As explained in 
[8], p. 70, this fact makes superfluous one of the axioms occurring in Gauss’ 
postulational approach to “errors of observation.” 

The purpose of the present note is the deduction of the angular analogue of 
the fact just quoted. The analogue states that, if f(a) is symmetric and stable, 
then the corresponding periodic F(x) is decreasing for 0 < x S } (and so, for 
reasons of symmetry, is increasing for } S x S$ 1). This is contained in the 
italicized statement of §4 below. 

In view of Poisson’s rule, quoted above, the periodic densities in question can 
be defined by certain Fourier series representing generalizations of elliptic theta- 
series. From this point of view, not even the existence (i.e., the positivity) of 
the periodic densities is obvious, if arbitrary values of the “precision constant” 
(denoted below by gq) are allowed. The difficulties involved are explained in §3. 


3. If g and X are positive constants the first of which is less than 1, then the 
(even, periodic) function 


(1) A(2;q) =1+2 > g” cos nz, 
n=l 


where q” > 0, has derivatives of arbitrarily high order at every real z. It is 
regular-analytic at every real x if and only if \ > 0 is replaced by A = 1, where 
the sign of equality holds if and only if the analytic continuation (from the z-axis) 
is not an entire function. In fact, it is known that a Fourier series 
Z(a, cos nx + b, sin nx) is that of a function which is regular-analytic at every 
real x, and has the period 27, if and only if | a, | + | b, | is majorized by a con- 
stant multiple of the nth power of a positive constant which is less than 1; 
and that the latter constant can be chosen arbitrarily small if and only if the 
analytic continuation does not lead to any singularity (ataz ¥ ~). 

Since the function (1) tends to 1 uniformly in x as g — +0, if d is fixed, there 
belongs to every \ > 0 a positive g* = q*(A) having the property that 


(2) 0(2;q) > Ofor0 S x < 2x 


if0 <q <q*(A). It is less obvious that, if qg is sufficiently small with reference 
to A, say if 0 < q < g**(A), then 


(3) 6,(x;q) is decreasing forO Sx Sr 


(hence, increasing for t < x < 27). The existence of such a g**(A) < © for 
every \ > 0 can be assured as follows: 

If s,(z) denotes the nth partial sum of the Fourier series 2(sin nx)/n, then 
s,(x) is positive for 0 < x < zm (Gronwall, Jackson; for a short proof, cf. [4]). 





ice 


for 


nen 


4]). 
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Hence, a partial summation shows that the sum of a sine series, 2b, sin nz, 
must be positive for0 < x < wif 


nb, — (n + 1)ba41 > O and nb, — 0. 


Since the first derivative of (1) (with respect to x) results by choosing 


b, = —2nq”, it follows that (3) must be true if 
nig” ae (n + nwe™ >0 
holds for nm = 1, 2,---. But the last inequality is readily seen to be satisfied 


from n = 1 onward if, while 2 is fixed, g tends to 0. This proves that q**()) 
exists for every \ > 0. 


4, From these deductions alone, it is quite unexpected that (the best values of) 
both g*(A) and g**(A) turn out to be independent of \ when 


(4) 0<A S2, 


i.e., that (1) satisfies both (2) and (3) for0 < q < 1, if (4) is assumed. This 
fact is of statistical significance, since, on the one hand, it is precisely the restric- 
tion (4) which is necessary and sufficient for the existence of Cauchy’s (sym- 
metric) “stable” distributions (cf. [6], pp. 254-263) and, on the other hand, 
the reduction (mod 27) of the densities of these linear distributions leads to 
the functions (1) as angular densities (cf. [9], pp. 477-478) ; the numerical value 
of q(< 1) being determined by the “precision” or “dispersion’’ of the resulting 
angular distributions. 

Under the necessary restriction (4), the linear analogue of g*(A) = 1 and of 
g**(\) = 1 was proved in [6], pp. 258-263 and in [8], pp. 71-77, respectively. 
It will remain undecided whether the restriction (4) is necessary in either of 
the angular cases. 


5. Suppose that » has a fixed value in the range (4). Then there exists a 
monotone function of t, say a(t), for which 


exp (—u) = I ° exp (—u7t) dax(t) 


is an identity in u, where 0 < u < & (ef. [1], p. 769, where further references 
will be found). Hence, a change of variables shows that 


q = I q'”* dax(t | log g |) 


is an identity in g and n, where 0 < gq < 1 andn = 0,1, 2, --+ (the integration 
variable is t). Consequeatly from (1), 


(259) = I 62(2; q') dex(t | log g | 2), 











592 AUREL WINTNER 


where 0 <q < land —x« <2z< om. In fact, the legitimacy of the term-by- 
term integration is obvious from 0 < g < 1 and da, 2 0 (even though the inte- 
grals are improper). 


6. Since a, is a non-decreasing function, it is clear from the last formula line 
that both (2) and (3) will be proved for 0 < q < 1 and for every d (satisfying (4)), 
if it is ascertained that both (2) and (3) hold for 0 < gq < 1 whend = 2. But 
the case \ = 2 of (1) is an elliptic theta-function, for which both properties in 
question (cf. the diagram in [3], p. 44) are known; a simple proof can be con- 
cluded from what, in Hecke’s terminology, is the Eulerian factorization of 
62(x ; g), as follows: 

According to Jacobi, the factorization of the case \ = 2 of (1) is 


6(¢3q) = IT  — g™)(1 + 2¢°" cos x +g”) 
n=l 
(ef. [7], pp. 64-65). Thus 


Oo(x 3g) = Cg Il P@+-r :q), 
Neel 


where 
a= I] (1 = a) 
and 
(5) , P(x ;r) = 1 — 2reosx+r, (0<r<l), 
hence 


P(z;r) >0 (0 <r <1). 


Since 0 < q < 1, this proves the case \ = 2 of (2). Furthermore, logarithmic 
differentiation of the product representation of 6.(x ; q) gives 


62(x ;q) = O(x ; q) a P(zt+ar;¢"")/P(r+7;¢""), 


where f’ = df/dz; so that, by (5), 
P'(x4 + r;7r) = —2r sin z. 

Since 0 < g < 1, the last three formula lines and the case \ = 2 of (2) imply that 
6(4;q) <0if0 <2 <7, 


as claimed by the case \ = 2 of (8). 
This completes the proof of the italicized assertion. 


LL — 
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ANOTE ON THE FUNDAMENTAL IDENTITY OF SEQUENTIAL ANALYSIS 


By G. E. ALBERT 
U.S. Naval Ordnance Plant, Indianapolis 


1. Introduction. Let {2z;}, (¢ = 1, 2,3, ---), be a sequence of real valued 
random variables identically distributed according to the cumulative distribution 
function F(z). Define the sums Zy = 2 + 2 + +--+ + zy for every positive 
integer N. Choose two positive constants a and b and define the random vari- 
able n as the smallest integer N for which one of the inequalities Zy = a or 
Zy = —bholds. The notations P(u | F) and E(u | F) will denote the probability 
of u and its expectation respectively assuming that F is the distribution of the z;. 

Wald [1] has established the results contained in the following lemmas. 

Lemma 1. If the variance of F(z) is positive, P(n < © | F) equals one. 

Lemma 2. If there exists a positive number 6 such that P(e’ << 1—6|F) >0 
and P(e* > 1+ 6| F) > Oand tf the moment generating function g(t) = E(e“ | F) 
exists for all real values of t, then g(t) has one and only one minimum at some finite 
value t = t). Moreover, ¢’’(t) > 0 for all real values of t. 

It is the purpose of this note to establish the following extension of the validity 
of certain results given by Wald [1], [2]. 

Tarorem. Under the conditions of Lemma 2 the identity 


(1) Efe [pO] "| F} =1 


1Wald’s results show (1) to be valid for all complex ¢ in the domain over which | e(t) | 21 
and the validity of the differentiation clause for all real ¢ in that domain. The import- 
ance of the present extension arises from the fact that, if E(x| F) ¥ 0, then 0 < ¢(t) <1 
on a certain interval of the real axis. 
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is valid and may be differentiated with respect to t under the expectation sign any 
number of times for all real values of t. 

Proor. The notation f will be used consistently to denote the ¢ value at which 
g(t) has its minimum. 

The proof of the theorem follows Wald’s methods quite closely and certain 
of the results given in [1] and [2] will be used here without discussion. cf 

Consider first the validity of (1). For an arbitrary positive integer N let Py 
be the probability P(n < N|F) and let Ey(u|F) and Ex(u | F) denote the 
conditional expectations of uv subject to the respective conditions n < N and 
n > N. Wald [1] has shown that for any finite real value of t 


(2) PyEy{e""[p(@)"| F} + (1 — Pre" Ex{e*™ | F} = 
Since lim PyEw{{g(é)) "exp(Z,t)} is the left member of the identity (1), it suffices 
N=o 





to demonstrate that 

(3) lim (1 — Py)[e()]~* Exfe?™ | F} = 0 
N=0 

for all real values of ¢. 

Since 1 — Py tends to zero with increasing N and the expected value Ey 
involved in (3) is bounded independently of N for any fixed ¢, the only source of 
difficulty in proving (3) lies in the fact that g(t) may be less than unity on an 
interval of the real axis. That difficulty is easily avoided by the following 
device. Define the function 


(4) G(x) = feof e° dF(z). 


Obviously G(x) is a distribution function whose moment generating function 
y(t) exists for all real ¢. Its mean is zero and its variance is positive as will be 
seen from the equations E(x |G) = ¢'(ts)/e(to) and E(2” | G) = 9" (to)/e(t). It 
follows that (¢) is never less than unity for real values of ¢. 

Let 2 denote the space of all a, --- , zy and let Q(n > N) be that subset of 2 
on which n > N. One has 


(1 — Py)[o(t)-* Ey {e7** | F} 


[ eM AF (a): dF (ex) [ee dG): - dG er) 
Q(n>N) — “8(%>N) 





| 2M aR (2) -+-dF (en) [ e280) dG (21) +» dy) 
2 Q 


= (1 — Qv)ly(s)]-* Ex{e?™* | G} 


where s = ¢ — t and Qy = P(n S$ N|G). By Lemma l, 1 — Qy tends to 
zero as N is increased. Thus, since y(s) = 1 for all real ¢ and the expected value 
Ex {e?** | G} is bounded independently of N for a fixed t, the equation (3) holds 
for all real ¢. 
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The differentiability clause of the theorem requires the following modification 
of a very powerful theorem due to Charles Stein [3]. 

Lemma 3. Under the conditions of Lemma 2, if the minimum (to) of g(t) is 
less than unity, there exists a positive number t, such that 


(5) E{exp [nti — n log o(t)] | F} < @. 


Proor. If G is the distribution of the z;, by Stein’s theorem there exists a 
positive number f; such that E(e™* |G) is finite. Let Q(n = N) denote the 
subset of Q on whichn = N. Then 


P(n=N|@ = I 4p Bae) + -aG (en) 


(hm, 


= [y(to)|-¥ [ oem Oe Fe): «dF Ge) 


= P(n = N|F) exp [min{aty, — bto} — N log ¢(to)]. 
It follows that 


E{exp [nt: — n log o(t)] | F} S Efe™* | G} exp[— min{aty , — bto}] 


and the lemma is proved. 

To continue with the theorem, Wald’s proof [2] suffices for the ease in which 
g(t.) 2 1. Attention will be given only to the case g(t) <1. As pointed out in 
section 2 of [2], the differentiability clause of the theorem will be established if 
it can be shown that for any finite interval J of the real axis and any pair of 
integers 7; and r. there exists a function D,,,.(Z,, n) such that for all ¢ in I 
one has 


6) Deyrg(Zn yn) = |" Z3e"[(T | 
and 
(7) E{Drir_(Zn » 0) | F} < Ow, 


On referring to Wald’s proof and using the inequality —log g(t) S —log ¢(t) for 
all t in J, it is seen that there exists a constant C and a positive number # such 
that the function 


Dyyrg(Zn ’ n) = Cn" [o(te) | "(e7"*? + e@ 2nt2) 


satisfies (6) for all tin J. To establish (7) use the inequalities (2.4) and (2.6) 
in Wald [2] to obtain 


E{Dryry (Zn ) | F} 
(6) = CD Pln = NIPINole)]™ Enaw (et? + 6°?" | F} 


< Cle? I(te) + €*1(—t)} E{ exp [r, logn — nlog g(t)] | F}. 
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That (7) is indeed satisfied now follows from (5) and the finiteness of the function 
l(t) since for a large enough integer V one has 


> P(n = N|F) exp [r; log N — N log ¢(t)] 


N= M 


<> Pn=N |’) exp [Nt — N log ¢(to)] < @, 
N= M 
Thus the expected value on the extreme right in (8) is finite. This completes 
the proof of the theorem. 
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A SIGNIFICANCE TEST AND ESTIMATION IN THE CASE OF 
EXPONENTIAL REGRESSION 


By D.S. Vituars! 


United States Rubber Company, Passaic, N. J. 


1. Introduction. The principal problem under consideration in this note 
may be described as follows. Consider a variate, z, whose distribution for a 
given value of a fixed variate, ¢, is: 


1 bent 
(1.1) f@|t) = Fe ene 
oV 24 
where a, b, and k are real-valued parameters. The regression of z on ¢ is exponen- 
tial, for it follows from (1.1) that the expected value of z, given 4, is: 


(1.2) E(z|t) =a — be™. 


On the basis of a random sample Oy(21, ti ; 22, &;-+-+: ; Zw, tw) it is desired to 
test whether k = 0 or ©. The problem of “fitting” a curve, z = a — be™, 
to the sample (7. e. of estimating a, b, and k from the sample) will also be treated. 

As an illustration of how the statistical problems described above arise in 
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practice, let us consider a typical situation in industrial chemistry. Let the 
quantity, z, be a property of a latex and let the quantity, t, be time. Suppose, 
furthermore, that measurement of ¢ is without error but that measurement of z 
is subject to error; let it be assumed that the observed value in a measurement of 
zjs a variate having a normal (Gaussian) distribution about the “true value,” 
E(z). On basis of N independent measurements, 2: , 22, +++ , Zw of z at times, 
ti, &,°-*, tn, respectively, the experimenter may wish to test the hypothesis 
that k = Oor ». If this hypothesis is true the suspected exponential relation 
between z and ¢ does not hold; in this case E(z) is a constant (a — b, or a) and 
estimation of the constant from the data is quite straightforward. If the data 
conflict with the hypothesis that k = 0 or , the experimenter may wish to 
estimate the parameters, a, b, and k (7. e., “fit” the curve, z = a — be‘, to the 
data). 

The problems considered in this note will be treated only for the case where N 
is an even integer (> 6) and the times t, , & , --- , fy at which measurements of 
z are made are such that 


(1.3) Ba — teat = A, a constant, (a = 1, 2,---,n = N/2). 


The odd time intervals, ts — t , ts — t , etc. do not have to be equal. 


2. Test of the hypothesis that k = Oor ~. The space, say Q, of admissible 
values of the parameters in (1.1) is:o* > 0, —2 <a,b,k < +o. Under the 
null hypothesis the admissible values of the parameters lie in a subspace of Q, 
say w, specified as follows: ¢ >0,-—” <a,b<+0,k =O0,0r ~. 

Let yj; = Ze and 2} = Za1, (a = 1,---,n = N/2). From (1.1) and (1.3) 
it follows that the n pairs x; , y; are normally and independently distributed with 


common variance, o, that x; and y; are independent (j = 1, 2, --- , ), and 
that 
(2.1) vy = h+ mp; 


where »; = E(y;), uj = E(x,;),h = a(l — e&“*), and m = e**. The space, 
0’, of admissible values of the parameters in the joint distribution of x; , y;, 
G=1,--: ,n),is:o > 0,7; =h+mu;,—-~ <h<+0,—-0 <y;,,< 
+0;0<m< «. Thesubspace of 2’, say w’, associated with the null hypoth- 
esis is: ¢ > 0, v1; = uy = c, Where c = a — b ora according as k = Oor o. 
In 2’, the expected values of x and y lie on a line; in w’ they lie in a single point. 
It is clear that by transforming the original sample Ov(a,4,--- ,2y, tw) toa 
sample 0,(21, Yi} *** 3 Xn, Yn) We have reduced the original problem to the 
familiar problem of linear regression in which there is “error in both variates”’. 

The slope of the “line of best fit” to the sample points (a1, yi; +--+ 3 nm, Yn) 
is [1]: 


(2.2) m = [Sy — Sez + W(Syy — Sez)? + 482y]/2Scy 
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where 

(x; — @)° 

(x; — £)(y; — 9) 


(y; — 9)” 


I 


p= Din 

(m is an estimate of m in (2.1)). Since m = e “* (where k and A are real), it is 
intuitively clear that when m is non-positive the sample 0, does not conflict 
with the null hypothesis. The null hypothesis can be tested by means of the 
statistic [2, 144] 


e 


_ Siz + 2mS.y + m Si 


(2.3)  * oa ee 7 





The null hypothesis is rejected if 7 is positive and F’ islarge. Percentage points 
of the distribution of F’ are given in [2, 146] for n = 3 (1) 15 (5) 30, 40, 60, 120 
and for significance levels, 0.001, .01, .05, .10, and .20. These significance 
levels, however, were computed for use in cases where the sign of m was irrele- 
vant. It happens that to test the null hypothesis under consideration in this 
problem at a significance level a we should use a critical value of F’ (given in 
[2]) corresponding to a significance level 2 a. The reason for this is that when 
the null hypothesis is true the quantities m and F’ are independent and the 
probability that m is positive is 3—thus the chance of rejecting the null hypoth- 
esis is 3(2a) = a. 


3. Estimation of a,b, and k&. If the data do not support the hypothesis that 
k = Oor ~, the experimenter may wish to estimate a, b, and k. General alter- 
native methods of estimating these parameters will now be considered. 

(1) Estimate a, b, and k from Oy by the method of least squares; 7.e., solve 
the simultaneous equations @S/daa = 0, aS/db = 0, and aS/ak = 0 for a, b, 
and k, where 


N 


(3.1) S=>>(;-—a+ be*)’, 


t=1 


The value of k obtained by this method of estimation will not in general be the 

same as that computable from m in (2.2) and used for the significance testing. 
(2) Estimate k by means of (2.2) and the relation m = e““, then substitute 

this estimate into S of (3.1) and estimate a and b by means of least squares. 
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(3) Estimate k as in (2) and choose, as an estimate of a, the intercept of the 
“line of best fit” for 0,. Then substitute these estimates of a and k into (3.1) 
and estimate b by means of least squares. In this case the estimate of b comes 
out to be: 


(3.2) b= Det eg — 2) /> > ets 


where 4 and & are the estimates of a and k. 
If the values, é: ,&, --+ , tw are such that t:41 — ¢t; = A, (¢ = 1,2, --- ,N — 1), 
the following estimation procedure might be used. 


(4) Let 
Yj = iH 
(2 =_ 1, 2,°°°, N i 1), 
G+ % 
and treat the (V — 1) pairs of values (a1, y1 ; +++ ; Zy-1, Yw-1) as a sample of 


size (NV — 1). Using this sample, estimate k, a, and b in a manner similar to 
that in (2) or (3). It should be noted that this sample is not a random sample 
owing to the dependence among the (N — 1) elements. 

The procedure in alternative (1) is very laborious and time-consuming. The 
procedure in (2) and (3) can be carried out quickly and easily. In (1) the 
method of least squares yields the same results as would be obtained from appli- 
cation of the method of maximum likelihood. Examples of estimation by proce- 
dures (3) and (4) are given in the next section. 


4. Example. The accompanying table lists experimentally observed values 
of a property of a latex obtained at biweekly intervals. Using the first, third, 
etc., quantities as x; and the remaining ones as y; , the sums of squares and prod- 
ucts of deviations are found to be: 


See = .035510 = = 0.9195 
Sey = .025645 
Syy = .023414 9 = 9365. 


Substituting these values in equation (2.2) and computing the other constants 
from equation (2.1) we get: m = 0.791596, a = 1.0009, and k = 0.1168. The 
F’ ratio is (2.3) 17.03. Entering Table I of [2], we find that for eight point pairs 
a value of F’ = 16.5 may be expected only one time in one hundred. On ex- 
cluding the possibility of negative values of m, this corresponds to the 0.5% 
significance level. The exponential relationship is thus concluded to be highly 
significant. 

Evaluation of b by equation (3.2), method 3, gives 0.2560, if ail 16 values are 


‘used. The equation calculated from the data is thus: 


(4.1) z = 1.0009 — 0.2560 6° U™. 
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The alternative procedure, method 4, would be to use all the z; points for the 


estimation of a and k. This leads to the following values of the computation 
quantities: 


16 
2 2 
Bas =< z Uy — X16 


= 0.052374; = = 0.9223 
t=1 
15 
Sw = > 5 2itiz1 = .036924 
t=1 
16 
Sw = 2.2 — zi = 035436; g = .9381. 


t=1 


Note that the difference S,, — S.z used in the formula for m cancels out all inter- 
vening squares between the first and last. 


2 2 
Sw — Sz = Xi — Te. 


TABLE I 
t Zi t 24 t 2 t 2s 
weeks weeks weeks weeks 
1 776 9 .939 17 .942 25 .955 
3 .852 11 .904 19 .938 27 .993 
5 .850 13 .930 21 .979 29 .985 
7 .869 15 .948 23 .975 31 1.013 


However, the data excluded thereby are in effect included in the new S,,. 
The final values obtained by the fourth procedure are: m = 0.796596, a = 

1.0000, and & = 0.1137. The writer does not know whether the peculiar trans- 

ference of data from S,, — S.z to S., characteristic of procedure 4 improves the 


accuracy of the fit or hurts it. It is his personal preference to use procedure 3. 


5. Acknowledgement. The writer wishes to acknowledge with thanks his 
gratitude to Drs. T. W. Anderson, Jr. and David F. Votaw, Jr. for many sug- 
gestions and discussions concerning this problem and for much help in clarifying 
the presentation of the concepts. 
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ON THE POWER EFFICIENCY OF A t-TEST FORMED 
BY PAIRING SAMPLE VALUES 


By Joun E. WautsH 


Princeton University 


1. Introduction. Consider two equal sized samples, one from a normal popu- 
lation with mean yp» and the other from a normal population with mean ». Let 
%1,°** ,&, be the sample values from the populaticn with mean p and y; , «++, Yn 
the values from the population with mean ». If the two populations have the 
same variance and the two samples are independent, the most powerful tests 
for comparing yw and »y using these samples (one-sided and symmetrical two- 
sided) are based on the statistic 


_— €-g—--—MYIVnm—- 1) 
"Eww 


which has a Student ¢-distribution with 2n — 2 degrees of freedom. Tests based 
on t2 also have the desirable property of being invariant under permutation of 
the data in each sample. 

Sometimes, however, it is useful to combine the sample values in the form 


2; = (a — ys), (¢@=1,--+,n). 
Examples: 

(a). When the samples are independent but it is not known that the two popu- 
lations have the same variance (Behrens-Fisher problem). 

(b). When there may be correlation between x; and y;, (¢ = 1,---, a), 
this correlation being the same for each value of ¢ (i.e. x; is independent of y; 
if 7 + j while each pair x; , y;, (¢ = 1, --+ , n), has the same normal bivariate 
distribution). 

In both (a) and (b) the z; are independently normally distributed with the 
same variance and mean p — ». 

The Student ¢-test for comparing uw and y using the z; is based on the statistic 


— [z —  — ( — Iv n(n — 1) OD [*¢-g- (u = YV'n{n = 1) 


Sean (2; — 2) if Stn -n- @- 9 [zs -y — (@- 9)? 


which has a Student ¢-distribution with n — 1 degrees of freedom. These tests 
are not invariant under permutation of the data in each sample. 

If it is true that all the sample values are independently distributed with the 
same variance o’, efficiency will be lost by using the test based on f; instead of 
the most powerful test based on #2. The purpose of this note is to determine the 
power efficiency of the tests based on 4; as compared with the corresponding 
tests based on #2 for this case. 

















TABLE I 





Power Function Values for the t; and te Tests 





Approx. Values of Power Function 














































































































Approx. 
Test n a a 
Efficiency 5=} <* 5=1} — 
t 6 87% .05 .276 674 | 933 | 994 
te 5.2 .05 .275 .672 | .932]| .994 
ti 6 82.5% 025 .159 | .486 822 | .970 
ts 4.95 .025 160 | .488 .823 | .970 
Bs ere 
ti 8 90% | 05 .355 | .812 .985 
te 7.2 | 05 354 | .813 985 
| = as ns 
ty S | 86.5% | .025 226 674 .952 | .998 
te 6.9 | 025 | .225 675 .951 | .998 
te 8 82% | .01 112 .458 843 | .983 
te 6.55 01 112 457 842 | .983 
ti 10 | 92% | .05 | .425 | .898 .997 
te 9.2 | 05 | .425 897 | .997 
ti 10 | 90% 25 | .289 | .892 | .988 
te 9 | | 625 .299 | .803 . 988 
ti 10 | 85.5% | .01 159 | .626 | .950| .999 
te | 8.55 | 01 159 | .627 .950 | .999 
Pra | — | ee ee ee | ~| ” et eee 
ty | 15 | 95.5% 05 | .579 | .980 
| 14.3 | | .05 579 | 980 | 
wel ars tocol - siponelemnnies sexilonatieia 
ti 15 | 93% 025 437 | .950 | 1.000 
te 13.95 | 025 437 | .949 | 1.000 
ietnnnaien a a ea Ri sacoeneliontinicerenti 
t1 15 | 90% | .o1 | .278 | .876 | .998 
te 13.5 01 .278 .876 .998 
ti 25 | 98% 05 .784 .999 | 
te | 4.5 | | .05 | .784 .999 | | 
a ee ee tee —— 
t | 25 | 96% | .025 | .670 | .998 | 
te 24 (| | .025 | .670 | .998 | 
| | 
den svcd etal etiteite cet. sik aemtent osetia 
ta 2 | 94.5% | .01 514 | .992 
te 23.7 | | .01 514 | 992 
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Consideration is limited to one-sided tests, which is not a serious limitation 
since any two-sided test can be considered as a combination of two one-sided 
tests. Table II contains approximate power efficiencies of one-sided tests for 
n > 4 at the significance levels a = .05, .025, .01. 

It is found that the efficiency of the é, test increases with the sample size but 
is high even for small size samples. 

2. Outline of computations. The method of obtaining power efficiencies 
used here will be that outlined in [1]. Essentially this consists in computing the 
power function for the test based on 4; and then adjusting the sample size for 
the corresponding test based on é until its power function is approximately the 
same as for the ¢, test. The ratio of the sample size (perhaps fractional) of the 
adjusted é test to that of the 4; test is called the power efficiency of the t, test. 
Intuitively this efficiency measures the fraction of the total available information 
which is being used when the 4; test is applied (since the & test is most powerful) | 


TABLE II 
Approximate Power Efficiencies for Given n and « 


| eee 


.05 |82.5%|85% |87% (|88.5%|90% (91% (92% |95.5%|98% | 100% 
.025 |77%* |80%* |82.5%|84.5%|86 .5%|88.5%|90% 193% 196% | 100% 
.01 |73% |75.5%|78% |80% |82% |84% |85.5%|90% |94.5%| 100% 


* These values were obtained by comparison with the corresponding values for 
a = .05 and .01. 


It is easily seen from symmetry that a one-sided ¢; test of » < v has the same 
power efficiency as the corresponding one-sided t test of » > v. Thus it is 
sufficient to consider the one-sided tests of u > ». 

The power function is found as a function of the parameter 5, where 


wv 
oF V2 

Most of the approximate power efficiencies were determined by using the 
normal approximation given in [2] to compute the power function values. This 
approximation was used for fractional values of n. Table I contains the results 
of these computations for one-sided tests of u > »v. 

Exact values of the power function for integral values of n and a = .05, .01 
can be found from the tables in [3]. A comparison of the power function values 
obtained from the normal approximation with these exact values shows that, 
forn < 6,a = Ol andn < 4,a = .05, .025, the approximation underestimates 
the true values for small 6 and overestimates for large 6. Although this combina- 
tion of underestimation and overestimation tends to cancel out in the determina- 
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tion of power efficiencies, so that little error in power efficiencies would be 
expected if the approximation were used forn = 6,a = .0l orn = 4,a = .05, 
the efficiencies given in Table II forn = 4,a = .05 and n = 4, 6, a = .O1 were 
obtained from the exact values by graphical interpolation and cross-interpolation. 

Power efficiencies were not considered for n < 4 because of the difficulties 
of interpolation and the inexactness of the normal approximation in this range. 

Forn = «, t, and é& both have a normal distribution with zero mean and unit 
variance. Thus the power efficiency is 100% at all significance levels for 
this case. 

These computations furnish approximate power efficiencies for n = 6, 8, 10, 
15, 25, © at a = .05, .025, .01, and forn = 4 at a= 5 and .01. The other 
approximate power efficiencies listed in Table II were obtained by graphical 
interpolation from these values. 

The results of this note can be roughly summarized for n < 15 by stating 
that of the 2n sample values 

(i). approximately 1.6 values are lost at the 5% significance level, 

(ii). approximately 2.1 values are lost at the 2.5% significance level, 

(iii). approximately 2.8 values are lost at the 1% significance level, if the 
tests based on ¢, are used instead of the corresponding tests based on. Exami- 
nation of Table I shows that the number of sample values lost decreases as n 
increases for n > 15. 
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NOTE ON THE LIAPOUNOFF INEQUALITY FOR ABSOLUTE MOMENTS 
By Maurice H. Bez 
The University of Melbourne 


For a variate x measured from the mean of the population, the absolute 
moment of order r is defined by. 


Vy = |[ | z |" dF (2), 


where F(x) is the cumulative distribution function. Treating r as continuous, 
we have 


of" .« . 
z= [lz] log. | z| dF (a), 


the integral on the right existing if v,,, exists. 


—$— 
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Write y = log. v,. Then we have 


wy  [ r a 
—- } | log. |x| dF (x), 


| 


0 C) © 2 
vr 8 = [ lel ar@)- [ef tog’ | 21 ace) af [2 log, || ae) } 


0, by Schwarz’s inequality. 


IV 





Fia. 1 


It follows that the function y is convex (or exceptionally a straight line), and, 
on referring to the figure, it appears that 


(1) MQ < MQ’ 


for all chords PR. If the abscissae of the points L, M, N are c, b, a, respectively, 
where c S b S a, the inequality (1) leads at once to the relation 


a—b b—c 
log. v%» < —— log. ». + —— log, vg. 
a-c a-c 


Hence 
ae. a oe 
which is the usual form of the Liapounoff Inequality. 
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REMARK ON THE NOTE “A GENERALIZATION OF 
WARING’S FORMULA” 


By T. N. E. GREvILLE 
U.S. Public Health Service 


Before submitting for publication the note “A generalization of Waring’s 
formula,’ Annals of Math. Stat., Vol. 15 (1944), pp. 218-219 the author made a 
diligent effort to ascertain, through correspondence with mathematicians and 
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actuaries both in this country and abroad, whether the generalized formula in 
question had been previously published, and none of the authorities communi- 
cated with knew of its prior publication. However, it has now come to his 
attention that the formula was published in essentially the same form by Hermite 
in the article “Sur la formule d’interpolation de Lagrange’, Journal fiir die 
Reine und Angewandte Mathematik (‘Crelle’s Journal”), Vol. 84 (1878), 
pp. 70-79. 
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1. Estimation of Parameters in Truncated Pearson Frequency Distributions. 
A. C. CoHEN, University of Georgia. 


Given a truncated univariate Pearson frequency distribution, parameters of the com- 
plete distribution are required. Karl Pearson and Alice Lee, (Biometrika, Vol. 6 (1915), 
pp. 59-69) and R. A. Fisher, (Introduction to Mathematical Tables, Vol. 1, British Assn. 
Adv. Sci., 1931, pp. xxvi-xxxv), obtained solutions of the truncated normal distribution 
with a single tail missing. The present paper presents three general methods of solution 
applicable to any of the Pearson distributions. The first utilizes moments of a higher order 
than are required to characterize corresponding complete distributions. The order of 
the highest moment required is increased by one for each missing tail. The second method, 
applicable when only « single tail is missing, utilizes the terminal ordinate at the point of 
truncation and moments of the same order as required to characterize the complete dis- 
tribution. The terminal ordinate is evaluated by successive approximations. The third 
method utilizes only the first two moments, but requires that the given distribution be 
further truncated and that moments be computed both before and after the additional 
truncations. This latter method can also be applied to complete distributions to avoid 
direct computation of third and fourth order moments. 


2. Distribution of a Root of Determinantal Equation. D.N.Nanpa, University 
of North Carolina. 


The joint distribution of the roots of a determinantal equation was given by P. L. Hsu 
in 1939 and the distribution of any one of the roots was studied by S. N. Roy. The present 
paper, however, gives a different method of working out the distribution of any root, 
specified by its place in a monotonic arrangement. This method enables us to express the 
distribution of a root of a certain determinantal equation in terms of a linear combination 
of products of incomplete beta integrals and in terms of the distribution of a root of lower- 
order determinantal equations. 


3. The Power of Certain Non-Parametric Tests of Independence. Wassity 
HokFFD1NnG, University of North Carolina. 


Several tests of independence have been proposed which are based on statistics depending 
only on the ranks of the sample values. Under the hypothesis Ho of independence the 
distribution of such statistics does not depend on the form of the parent distribution. 
Two of these statistics, Spearman’s rank correlation coefficient and Lindeberg-Kendall’s 
statistic based on the number of inversions in the permutation of the ranks, are shown to 
be asymptotically normally distributed in samples from any population (the limiting nor- 
mal distribution being singular in certain degenerate cases). The asymptotic distribution 
of these coefficients reveals that the corresponding tests of independence are inconsistent 
(in the sense that the probability of rejecting Ho does not necessarily tend to 1 if Ho is not 
true), and at least one of them is biased in the limit. It can be shown that at least for some 
sample sizes and some sizes of the critical region there do not exist unbiased tests of inde- 
pendence based on ranks. But there do exist rank tests of independence which are con- 
sistent, and hence unbiased in the limit. Examples of such tests are given. 


4. Some Significance Tests for the Mean Using the Sample Range and Midrange. 
JoHN E. WaAtsH, Princeton University. 
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Consider a sample of size n, (2 < n < 10), drawn from a normal population with mean y, 
Let z, be the largest value and 2; the smallest value of the sample. Significance tests are 
developed to compare yu with a given hypothetical value yo by use of the sample. These 
significance tests are based on the quantity D = [3(z%1 + Zn) — wol/(tn — 21) = [(sample 
midrange) — (hypothetical mean)]/(sample range). One-sided and symmetrical tests are 
considered. Values of D.such that Pr(D > Da | » = uo) = aare computed for a = .05, .025, 
.01, 005. These values of D, can be used to obtain one-sided tests at the .05, .025, .01, .005 
significance levels and symmetrical tests at the .10, .05, .02, .01 significance levels. Effi- 
ciencies are computed for one-sided tests at the .05 and .01 significance levels. The effi- 
ciency is at least 90% for n < 6 at the .05 significance level and for n < 8 at the .01 level. 
The range-midrange test can be applied without computation through the use of an easily 
constructed graph. The application of a test requires only the plotting of the sample 
point (% , Z,) on this graph. 


5. Testing Compound Symmetry in a Normal Multivariate Distribution. Davin 
F. Voraw, JR., Princeton University. 


Let F(X) be the d.f. of a t-order vector variate X(t > 3). Suppose the components of X 
are divided into mutually exclusive and exhaustive subsets. F(X) is said to be compound 
symmetric, for the given division of its variates into subsets, if it is invariant over all per- 
mutations of its variates within these subsets. F(X) is completely symmetric if the invari- 
ance holds over all permutations of its variates. If F(X) is normal and compound sym- 
metric, then within each subset of variates the means are equal, the variances are equal 
and the covariances are equal, and between any two subsets of variates the covariances 
are equal. Testing hypotheses of compound or complete symmetry in a normal F(X) 
is of interest, for example, in studying psychological examinations and in medical research. 

In this paper likelihood ratio criteria are developed for testing various hypotheses 
involving compound symmetry in regard to a normal distribution and to k normal dis- 
tributions (k > 2). Given that the corresponding null hypothesis is true, the moments 
of each criterion are obtained explicitly and the distribution of each criterion is identified 
as the product of independent beta variates (in the case of a single normal distribution, 
the distributions are given explicitly for t = 3, 4, and 5 for certain divisions of the variates 
into subsets). In a previous paper Wilks has given results on a very thorough study of 
the sampling theory of likelihood ratio criteria for various hypotheses involving complete 
symmetry in regard to a normal distribution. 


6. Effects of Non-Normality at High Significance Levels. HAroitp Hore .t- 
ING, University of North Carolina. 


The effects of non-normality in the underlying population on the probabilities of sig- 
nificance by customary statistical tests are not well understood, in spite of numerous 
attacks, both mathematical and experimental, on the problem. Chung’s recent proof that 
the distribution of the Student ratio ¢ has in samples from an arbitrary population a dis- 
tribution approaching normality for large samples tends to confirm the common idea that 
non-normality makes little difference if only the sample is fairly large, but this holds 
only for a fixed range of values of ¢ while the sample number N increases. The tail areas 
beyond a deviation which increases with N in certain ways often behave quite differently 
than in sampling from a normal population. If p is the probability that | ¢| > ¢ in sam- 
ples of N from a normal population and p’ is the corresponding probability for another 
population, it is shown that a {lim ("7 »)} may be zero or infinite or may take any 

co 
finite value, even when the non-normal distribution involved is of simple and realistic 
continuous forms. The conditions that this limit be unity are concerned only with the 
shoulders of the population histogram, and have nothing to do with its moments or its 
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behavior at infinity or at its mean. This suggests that caution should be used in applying 
familiar tests with high significance levels; that further calculations should be directed 
toward making this caution quantitatively definite; and that the use of sample moments 
or cumulants cannot lead to the most appropriate criterion of non-normality for this 
purpose. 


7. On the Problem of Similar Regions. E. L. Lenmann, University of Cali- 
fornia, Berkeley, and Henry Scuerrf, University of California, Los Angeles. 


If X = (X1,--+ , Xn) is a set of random variables with a joint probability density 
depending on a set of parameters 0 = (0:,--- , Om), and if T = (7, ,--- , Tm) isa set of 
sufficient statistics for 6, then Neyman (Phil. Trans. Roy. Soc. London, Vol. 236 (1937), 
pp. 333-380) has proved that a region w in the space of X is similar with respect to @ if it 
has the following structure: The intersections w(t) of w with the surfaces 7’ = t have the 
property that the conditional probability of the sample point X falling into w given that 
T = t does not depend on?t. In the present paper a necessary and sufficient condition is 
found for the regions with the above structure to be the only similar regions. This con- 
dition is shown to be satisfied for a certain class K of probability densities which contains 
as special cases all densities for which the totality of similar regions has been previously 
determined. In particular the partial differential equations which Neyman (Annals of 
Math. Stat., Vol. 12 (1941), pp. 46-76) assumed were satisfied in his solution of the problem 
of similar regions are solved and it is shown that any density satisfying these equations 
belongs to the above class K. 


8. Fourth Degree Exponential Function. L. A. ArorAN and MARGUERITE 
Darkow, Hunter College. 


It is shown that the fourth degree exponential function is supported by the Bernoulli 
probability function and the hypergeometric probability function as well as being the 
function for which the method of moments is the best method according to the criterion of 
maximum likelihood. In the general situation six moments, at most, are needed. The 
function is classified into two general groups depending on symmetry or asymmetry and 
each case is divided again into unimodal and bimodal distributions. Examples show that 
the function is very successful in graduating the main Pearson types and the Gram-Charlier 
Type A frequency function. Various generalizations of the exponential function are 
indicated. In addition to its wide generality, the greatest practical advantage of the new 
system is the simplicity of the numerical calculations. 


9. A General Weak Limit Theorem for Independent Distributions. P.L. Hsu, 
University of North Carolina. (Read by title.) 


For every positive integer n let there be n distribution functions Fri(z), Fa2(z), 
-, Fan(x). Assume that limpow Maxisj<na{l — Faj(z) + Faj(—z)} = 0. Let F(z) 


+00 
be the convolution Fyi(x)*Frao(z)* +--+: *Faa(z). Let y(t) = mit + [ [es — 


1 — itx/(1 + 2*)](1 + 2?) /x? dG (x), with G(x)! and G(«) —G(—«) < «. Let F(z) be the 
(infinitely divisible) distribution law having exp y(t) as its characteristic function. 
In order to have lim,,F,(z) = F(x) at every continuity point of F(z), it is necessary 
and sufficient that the following relations hold at every z > 0 such that +2 are continuity 
points of G(y): 


(I) limp+o >, | dF,;(y) = | ((1 + y*)/y*) dG(y), 
ly|>z ly|>z 


full 
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n f 2 
(II) limp eo Zz | y dF, — (/ y ar.) } = / (1 + y*) dG (y), 
= | |u| >z jul<z lul<z 


(III) limy—+0 >, | y dFnj(y) = m+ | y dG(y) — / (1/y) dG(y). 
lul<z lul|<z lul<z 


j=1 


10. On the Maximum Partial Sums of Sequences of Independent Random 
Variables. K. L. Cuuna, Princeton University. 


The asymptotic behavior of the maximum partial sums of a sequence of independent 
random variables is studied in this paper. Two groups of new limit theorems are estab- 
lished under general conditions. The first group deals with theorems of the weak type. 
The limiting distribution of the maximum partial sums is obtained with an estimate of 
the remainder, thus improving a recent result of Erdés and Kac. Another estimate is 
obtained for a different domain of variation, which plays an essential role in the sequel. 
These results correspond to the sharper forms of the central limit theorem. In the second 
group, theorems of the strong type are obtained, giving precise lower bounds (in the sense 
of probability) for the maximum partial sums. These results form the exact counterpart 
to the general form of the law of the iterated logarithm, due to Feller, which give the pre- 
cise upper bounds. A summary of the main results and methods has appeared in Proc. 
Nat. Acad. of Sci., Vol. 33 (1947), pp. 132-136. 


11. Some Results on the Distribution of Quadratic Forms From Gaussian 
Stochastic Processes. (Preliminary report). HrrMAN Rusin, Cowles 
Commission. 


If one considers the estimation of the parameters of a Gaussian stochastic process 
whose elements are continuous functions from the functional values over a finite interval, 
one often finds that certain parameters can be estimated exactly, and certain parameters 
can not. This result often depends on the distribution of quadratic functionals whose 
arguments are elements of the stochastic process under consideration. In this paper, it 
is shown that the elements of a certain class of quadratic functionals have distributions 
concentrated at a point, and that the elements of a different class do not; in this latter case, 
the characteristic function is computed. 


12. Some Significance Tests for the Median which are Valid under Very General 
Conditions. (Preliminary Report) Jonn E. WaAtsu, Princeton University. 
(Read by title.) 


Consider n independent values drawn from populations necessarily satisfying only: 1) 
Each population has a unique median. 2) The median has the same value ¢ for each popu- 
lation. 3) Each population is symmetrical. 4) Each population is continuous. (It 
is to be emphasized that no two of the values are necessarily drawn from the same popula- 
tion.) Significance tests are derived for g on the basis of 1)-4). These significance tests 
are based on order statistics of certain combinations of order statistics, each combination 
being either a single order statistic of the n values or one-half the sum of two order statistics. 
The tests are invariant under permutation of the n values and reasonably efficient if the 
values represent a sample from a normal population. The significance levels are of the 
form r/2", (r =1,--- ,2"— 1). Each value of r can be obtained for some one-sided signifi- 
cance test. Thus any significance level can be closely approximated if n is large. The 
major disadvantage of these tests is the limited number of suitable significance levels avail- 
able for small values of n. This disadvantage is partially eliminated by the development of 
tests which have a specified significance level if the values are a sample from a normal 


ABSTRACTS OF PAPERS 6112 


population and a significance level bounded near this specified value if only 1)-4) necessarily 
hold. Results based on 1)-4) are applied to several well known statistical problems: 
Tests are obtained for the mean on the basis of a large number of independent values from 
populations having the mean but little else in common. Also generalized results are ob- 
tained for the Behrens-Fisher problem, quality control, slippage tests, the sign test and 
cases where some of the n values are dependent. 


13. Loss of Information in {-tests with Unbalanced Samples. (Preliminary 
Report) Joun E. Watsu, Princeton University. (Read by title.) 


Consider two normal populations, the first with mean a; and variance a7, the second with 
mean a2 and variance o2, while ¢:1/02 has a known value C. If the hypothesis a; = a is to 
be tested by a t-test (one-sided or symmetrical) using n; sample values from the first popu- 
lation and n2 values from the second population (nm: + nz = n, fixed), it is shown that this 
experiment is most powerful when ni/n2 = o1/o2 (integer considerations neglected). The 
t-tests satisfying this condition will be referred to as balanced t-tests. Thus information 
will be lost by not using a balanced experiment. A quantitative measure of the information 
lost by using given values of n; and n, is determined by the total sample size m, (m; + m2 = 
m), of the balanced t-test (same significance level) which has approximately the same power. 
Then n — m sample values are wasted by using (m1 , n2) rather than (m; , me), i.e. only 
100m/n% of the information obtainable per observation is used by (m, m2). A sym- 
metrical t-test with significance level 2a has the same value of m as a one-sided t-test with 
significance level a. For one-sided t-tests with significance level a: m = }(b + ~/B? — 8A), 
where B= 2+A+ K?/2,A = (C+ 1)%{l — K? /2(n — 2)][C?/ni + 1/n2]“, and Ka is the 
standardized normal deviate exceeded with probability a. This approximation to m is 
valid for m > 5ifa = 05,m > 6ifa = 025,m>7T7ifa= 0l,m> 8ifa =.005. (A 
fractional value of m represents an interpolated measure of the sample size of the equivalent 
balanced experiment.) 





14. Some Theorems on the Bernoullian Multiplicative Process. T. E. Harris 
Princeton University. (Read by title.) 


A single entity may have j descendents with probability P; , (j = 0,1, 2,---). Each 
first generation entity has then the same procreative probabilities, etc. Let 


S(s) = pot pis + +: 


If z, is the number of entities in the nth generation, it is known that P(z, = 7) is given by 
the coefficient of s/ in the nth iterate f[f --- (f)] = fa(s). Let Ez: = x1,1 << 2 < «©. Con- 
ditions are given insuring that as n — © the cumulative distribution of the variate z,/z* 
approaches a limit-function which is absolutely continuous except for a possible single 
jump. Let g(u) be the corresponding frequency function. If f(s) isa polynomial of degree 
k,letq = logzk/(logzk — 1). Otherwise,g=1. Theng(u)-exp{u?**} [is, is not] summable 
(0, ©) according as ¢is [negative, positive]. Behavior of g(u) near u = 0 is also considered. 
Special cases are considered where g(u) = constant-u1/"—1!.e-“/™, ma positiveinteger. Max- 
mum likelihood estimates for the parameters po , p1, --+ , and 2 are obtained as functions 
of m successive values 21, 22,°-*: , 2n. Consistency, in a certain sense, is proved. A 
specialized method is given for finding the moment-generating function of the variate N, 
the smallest value of n such that z, = 0. 











NEWS AND NOTICES 
Readers are invited to submit to the Secretary of the Institute news items of interest 


Personal Items 


Dr. George E. Albert has been appointed to an associate professorship at the 
University of Tennessee. 

Dr. T. W. Anderson, Jr. has been promoted to an assistant professorship in 
the Department of Mathematical Statistics at Columbia University. He is on 
leave the first half of the 1947-48 academic year at the Institute of Actuarial 
Mathematics and Mathematical Statistics, Stockholm University as a Guggen- 
heim Fellow. During the second half of the academic year he will be at Cam- 
bridge University. 

Associate Professor Max Astrachan has been promoted to a full professorship 
at Antioch College, Yellow Springs, Ohio. 

Associate Professor T. A. Bancroft, who has been at the University of Georgia, 
Athens, Georgia, is now with the Statistical Laboratory, Alabama Polytechnic 
Institute, Auburn, Alabama. 

Dr. M. 8. Bartlett of Cambridge University has been appointed as Professor 
of Mathematical Statistics at the University of Manchester, Manchester, 
England. ‘The position is a newly created one. Professor Bartlett indicates 
that this position is believed to be the first official professorship in mathematical 
statistics in England. 

Professor M. A. Brumbaugh has accepted a position with the Bristol Labora- 
tories Inc., Syracuse 1, New York. 

Dr. Donald A. Darling has been appointed Research Associate at Cornell 
University. 

Professor D. B. DeLury of the Virginia Polytechnic Institute has accepted a 
position with the Ontario Research Foundation, 43 Queens Park, Toronto 5, 
Canada. 

Professor Abel Gauthier of the University of Montreal has been appointed 
Head of the Institute of Mathematics and Assistant-Secretary of the Faculty of 
Science at that institution. 

Dr. Casper Goffman, former assistant professor in the Mathematics Depart- 
ment, University of Kentucky, is now in the Mathematics Department, Univer- 
sity of Oklahoma, Norman, Oklahoma. 

Mr. Philip Hardy has returned to the General Electric Company at Warren, 
Ohio after serving at Wright Field. 

Dr. Carl F. Kossack, who has been with the Navy Department in Washington, 
D. C. as an Air Intelligence Specialist, has accepted an associate professorship in 
the Department of Mathematics at Purdue University. 

Mr. Frank Jones Massey, Jr. is now teaching in the Department of Mathe- 
matics, University of Maryland, College Park, Maryland. 
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Dr. William Burton Michael, who has been Lecturer in Mathematics, Psy- 
chology and Educational Psychology at the University of Southern California, 
has now accepted an assistant professorship in the Department of Psychology, 
Princeton University. He is also a member of the Research Department, 
College Entrance Examination Board at Princeton. 

Mr. Bernard Ostle, a former teaching assistant, School of Business Adminis- 
tration, University of Minnesota, is now at Iowa State College, Ames, Iowa. 

Mr. Maurice H. Quenouille, who was formerly with the Rothamsted Experi- 
mental Station, Harpendon, Herts, England, has accepted the position of 
Lecturer in Statistics, Marischal College, University of Aberdeen, Scotland. 

Dr. James A. Rafferty left the Department of Pathology, University of 
Rochester in June and has been appointed Chief of the Department of Statistics, 
Air University, School of Aviation Medicine, Randolph Field, Texas. 

Miss Mary Ann Savas has accepted a position with General Motors, Detroit, 
Michigan. 

Professor George J. Stigler, formerly with Brown University, is now teaching 
in the Department of Economics, Columbia University, New York, New York. 

Professor E. L. Welker has resigned an associate professorship in mathematics 
at the University of Illinois to become Associate in Mathematics in the Bureau of 
Medical Economic Research of the American Medical Association. 

Mr. Sol M. Wezelman, who completed his master’s degree in actuarial science 
at the University of Michigan in June, has accepted a position as Assistant 
Actuary in the North Dakota State Department of Insurance, Bismarck. 

Dr. Bertram Yood has received his doctorate at Yale and is now on the staff 
at Cornell University. 

Mr. Earl K. Yost, Jr. has accepted a position with the General Electric Co. at 
the Hanford Engineering Project, Richland, Washington. 

Professor James G. Smith, of Princeton University, died at Princeton on 
November 28, 1946. 

ee 


Beginning with the October issue, the quarterly journal Mathematical Tables 
and Other Aids to Computation will publish a new feature section, “Automatic 
Computing Machinery,” designed to disseminate information and news on 
research and development in the field of high-speed automatic calculating 
machinery. Material should fall under the general headings of Bibliography, 
Technical Developments, Discussion (including correspondence), and News. 
Contributions to this section are invited and should be addressed to Dr. E. W. 
Cannon, Head of the Mathematics Group, Machine Development Laboratory, 
National Bureau of Standards, Washington, D. C. 


a a ee 


Institute of Numerical Analysis Established 


Plans have been completed for the establishment of one of the newest units of 
the National Bureau of Standards—the Institute of Numerical Analysis—at the 
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University of California at Los Angeles, according to an announcement by Dr. 
Edward U. Condon, Director of the Bureau. 

One of the giant high-speed electronic computing machines, now under devel- 
opment by the Bureau of Standards, will be installed at the Institute when 
completed. Design specifications call for high memory capacity and auto- 
matically sequenced mathematical operations from start to finish at speeds 
attainable only with electronic equipment. 

The Institute has two primary functions. ‘The first is research in applied 
mathematics aimed at developing methods of analysis which will extend the use 
of the high-speed electronic computers. The second is to act as a service group 
for Western industries, research institutions, and government agencies. The 
service function will include not only the use of the machines for problem solving 
but also assistance in the formulation of problems in applied mathematics of the 
more complex and novel types. Service operations are to be initiated immedi- 
ately, using the latest types of commercially available computing equipment. 

The decision to locate the Institute at the University of California at Los 
Angeles was made after a nation-wide survey by the National Bureau of Stand- 
ards. Centers in the East and Middle West were considered as well as the Far 
West, but Los Angeles, it was decided, offered the widest range of possibilities 
for an Institute of Numerical Analysis. Concentration of aircraft industries and 
the presence of several major scientific institutions were critical in the choice of 
Los Angeles. 

ees 


Election of Fellows 


The Board of Directors announced at the Yale Meeting the election of the 
following members as Fellows of the Institute: Theodore W. Anderson, Jr., 
Alexander C. Aitken, David H. Blackwell, Georges Darmois, Ragnar Frisch, 
Robert C. Geary, Frederick Mosteller, Gerhard Tintner, Charles P. Winsor and 
John Wishart. 


New Members 
(a 


The following persons have been elected lo membership in the Institute 
(June 1 to August 29, 1947) 

Baldwin, Helen Mildred, B.S. (Cornell) Research Associate in Statistics, Atomic Energy 
Project, 215 Avenue C, Rochester 5, N.Y. 

Blunk, Paul M., A.B. Teaching asst. and grad. student, Univ. of Calif., Box 632, Fair 
Oaks, Calif. 

Bowden, George Edwin, B.S. (Duke) Teaching asst., Math. Dept., White Hall, Cornell 
Univ., Ithaca, N. Y. 

Bradley, Ralph Allan, M.A. (Queen’s Univ.) Grad. student, Univ. North Carolina, Well- 
ington, Ontario, Canada. 

Burton, Kenneth John, Head of Statistics Section, British Kmployers’ Confederation, 16 
Rutherwyke Close, Ewell, Surrey, England. 

Carlson, Phillip G., Jr., A.M. (Columbia) 148 Cornell Street, Roslindale 31, Mass. 
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Carol, Bernard, M.S.E. (Columbia) Graduate student at Columbia Univ., 15 West 96th 
Street, N.Y. 

Clark, Sidney B., B.A. (George Wash. Univ.) Statistician, Public Roads Administration, 
2728 Porter St., N.W., Wash., 8, D.C. 

Danielson, Theresa, M.A. (Univ. of Ill.) Mathematician at Brookhaven National Labora- 
tory, 8512 Cambridge Ave., New York 68, N.Y. 

Dineen, Russell D. F., B.A. (Univ. of Delaware) Graduate student at Univ. of Delaware, 
1318 French Street, Wilmington, Delaware. 

Diver, M. L., M.E. (Purdue) Consulting Engineer, P.O. Box 1016, San Antonio 6, Texas. 

Erasmus, Josias C., M.S... (Univ. of Stellenbasch, South Africa) Research Officer, 
Grootfontein College of Agriculture, Middelburg, C-P, South Africa. 

Gottlieb, Morris J., Ph.D. (Wash. Univ., St. Louis) Member of the Institute for Advanced 
Study, Washington University, St. Louis, Mo. 

Greenwood, Joseph Arthur, A.}8. (Harvard) Student at Harvard University, 66 Oxford 
St., Cambridge 88, Mass. 

Gysbers, Jack C., M.A. (Univ. of Calif.) Teaching asst., Dept. of Math., Univ. of Calif., 
2029 Berkeley Way, Berkeley 4, Calif. 

Haskind, Mina, B.S. (Brooklyn College) Student at Brooklyn College, 763 Eastern Park- 
way, Brooklyn 13, New York. 

Hauser, Dr. Philip M., Ph.D. (Univ. of Chicago) University of Chicago, Chicago 37, III. 

Hoyt, Cyril J., Ph.D. (Univ. of Minn.) Research Associate, Dept. of Education, Univer- 
sity of Chicago, Chicago, Il. 

Kern, Enrique Roberto, First Assistant, Institute of Biometry, Univ. of Buenos Aires, 
Rivadavia 8854, Buenos Aires, Argentina. 

Mark, Abraham M., Ph.D. (Cornell) Mathematics Department, Univ. of Wisconsin, 
Madison, Wisconsin. 

Moss, George G., II, B.A. (St. John’s College, Annapolis) Actuarial Statistician, Metro- 
politan Life, 2771 Morris Ave., N. Y. 58, N.Y. 

Phillips, Bernard E., A.M. (Columbia) Box 147, Cathedral Station, New York 25, N.Y. 

Radvanyi, Laszlo, Ph.D. (Univ. of Heidelberg) Professor of Economics, National Univ. of 
Mexico, Donato Guerra 1, desp. 207, Mexico, D. F. 

Richardson, John M., Ph.D. (Cornell) Member of Technical Staff, Bell Telephone Lab- 
oratories, Inc., Murray Hill, New Jersey. 

Royston, Robert W., M.S. (Univ. of Mich.) Asst. Prof., Math. Dept., Wash. & Lee Univ., 
117 W. Washington St., Lexington, Virginia. 

Savas, Mary A., A.B. (Univ. of Mich.) Student at Univ. of Mich., 524 F. Second St., Mon- 
roe, Mich. 

Shepard, David H., A.B. (Univ. of Mich.) Research Analyst, Army Security Agency, 
505 Randolph Street, Falls Church, Virginia. 

Throdahl, Monte C., B.S. (Iowa State College) Research Chemist in Charge of Rubber 
Lab., Monsanto Chemical Co., Nitro, West Virginia. 

Uchytil, Jan, Doctor of Science, Chief of Production Control Dep. in Central Federation 
of Czech. Industry, Praha II, Prikopy 14, Czech. 

Vergara, Jose, Doctor of I:ngineering, (Madrid) Professor of Economics, Madrid, Chief 
of the Bureau of Statistics, Dept.of Agric., Madrid. 6262S. Blackstone Ave., Chicago 
37, Illinois. : é 

Wei, Dzung-shu, Ph.D. (Univ.of Iowa) Prof. and Head of Math. Dept., St. John’s Univ., 
Shanghai, 129 East 10th St., New York 3, N.Y. 

Wolfson, Jacob, B.A. (New York College) Statistician, Social Security Adm., 845 Bruns- 
wick Road, Esser, Maryland. 











REPORT ON THE NEW HAVEN MEETING OF THE INSTITUTE 


The Tenth Summer Meeting of the Institute of Mathematical Statistics was 
held at Yale’ University, New Haven, Connecticut, Tuesday, September 2 
through Thursday, September 4, 1947. The meeting was held in conjunction 
with the summer meetings of the American Mathematical Society and the 
Mathematical Association of America. ‘The following 150 members of the 
Institute attended the meeting: 


C.B. Allendoerfer, R. L. Anderson, H. ki. Arnold, L. A. Aroian, H. M. Bacon, J. L. Barnes, 
W. D. Baten, R. E. Bechhofer, A. A. Bennett, Joseph Berkson, D. H. Blackwell, C. I. Bliss, 
Colin Blyth, Jr., A. E. Brandt, G. M. Brown, R. H. Brown, O. P. Bruno, P. T. Bruyere, 
Mrs. P. T. Bruyere, J. H. Bushey, B. H. Camp, G. C. Campbell, Uttam Chand, K. L. Chung, 
W.G. Cochran, A. C. Cohen, Jr., E. P. Coleman, T. F. Cope, G. M. Cox, C. C. Craig, E. L. 
Crow, H. B. Curry, G. B. Dantzig, M. D. Darkow, B. B. Day, Bernard Dimsdale, C. E. 
Dieulefait, C. W. Dunnett, Churchill Eisenhart, L. R. Elveback, M. W. Eudey, H. P. Evans, 
William Feller, C. D. Ferris, M. M. Flood, R. M. Foster, H. A. Freeman, J. E. Freund, H. P. 
Geiringer, M. J. Gottlieb, J. Arthur Greenwood, Evelyn Groosman, F. E. Grubbs, H. T. 
Guard, P. R. Halmos, Max Halperin, M. H. Hansen, B. 1. Hart, Mina Haskind, Wassily 
Hoeffding, R. H. Hoskins, Harold Hotelling, A. S. Householder, Jaroslav Janko, Irving 
Kaplansky, Leo Katz, Oscar Kempthorne, E. M. Kennedy, W. L. Kichline, C. J. Kirchen, 
L. F. Knudsen, H.S. Konijn, C. F. Kossack, Jack Laderman, H.G. Landau, E. L. Lehmann, 
Rk. A. Leibler, Walter Leighton, Jr., F. C. Leone, Joseph Lev, Howard Levene, Julius Leib- 
lein, Arthur Linder, 8. B. Littauer, E. D. Lowry, H. F. MacNeish, P. J. McCarthy, John 
Mandel, H. B. Mann, Sophie Marcuse, F. J. Massey, Margaret Merrell, E. B. Mode, M. E. 
Moore, Frederick Mosteller, D. N. Nanda, P. M. Neurath, M. C. Neurdenburg, G. E. Noe- 
ther, M. L. Norden, H. W. Norton, P.S. Olmstead, A. L. O’Toole, E. R. Ott, T. E. Oxtoby, 
Edward Paulson, M. P. Peisakoff, G. B. Price, J. A. Rafferty, L. J. Reed, C. J. Rees, P. R. 
Rider, John Riordan, H. E. Robbins, Milton da Silva Rodrigues, A. C. Rosander, Ernest 
Rubin, Herman Rubin, Frank Saidel, M. M. Sandomire, Arthur Sard, Max Sasuly, F. E. 
Satterthwaite, E. D. Schell, Jack Sherman, Rosedith Sitgreaves, Andrew Sobezyk, Milton 
Sobel, Herbert Solomon, Mortimer Spiegelman, Arthur Stein, Henry Teicher, R. M. Thrall, 
Gerhard Tintner, M. N. Torrey, J. W. Tukey, D. F. Votaw, Jr., Abraham Wald, H. M. 
Walker, J. E. Walsh, R. M. Walter, J. H. Watkins, Dzung-shu Wei, E.S. Weiss, S. 8. Wilks, 
C. P. Winsor, H. O. Wold, Jacob Wolfowitz, C. A. Wright, Bertram Yood. 


The Tuesday afternoon session was devoted to a symposium on 2 x 2 tables 
with Professor Lowell J. Reed of Johns Hopkins University serving as chairman. 
Addresses were given on Tests of Significance by Dr. Churchill Eisenhart, Na- 
tional Bureau of Standards; Estimation by Dr. Charles P. Winsor, Johns Hopkins 
University and Non-Standard-Cases by Dr. Joseph Berkson, Mayo Clinic. 
Discussants were Mr. William F. Taylor, Dr. Frederick Mosteller, Professor 
David H. Blackwell and Professor John W. Tukey. The attendance was 
approximately 130. 

The first Wednesday morning session was devoted to contributed papers. 
Professor John W. Tukey of Princeton University presided. The attendance 
was approximately 85. The following three papers were presented: 

1. Estimation of Parameters in Truncated Pearson Frequency Distributions. 
Professor A. C. Cohen, University of Georgia. 
616 
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2. Distribution of a Root of a Determinantal Equation. 
Mr. D. N. Nanda, University of North Carolina. 

3. The Power of Certain Non-Parametric Tests of Independence. 
Dr. Wassily Hoeffding, University of North Carolina. 


The second Wednesday morning session was held with Professor Will Feller, 
President of the Institute, presiding. Professor R. A. Fisher, University of 
Cambridge, gave the address under the title The Fitting of Gene Frequencies to 
Data for Genotypes. The attendance was approximately 160. 

The membership business meeting of the Institute was held at 9:15, Thursday 
morning, in 102 Chittenden Hall with President Feller presiding. The attend- 
ance was approximately 55. It was voted to make certain changes in the 
By-Laws and in particular to raise the due to $7.00 per year. (An exception is 
made for those living outside the Western Hemisphere.) Morris Hansen, 
Chairman of the Committee on Planning and Development, initiated a lively 
discussion with reference to desirable changes in the Constitution. 

On Thursday morning at 10:30, with President Feller prcsiding, Professor A. 
Wald of Columbia University presented the Henry Lewis Rietz Lecture on 
Sequential Estimation and Multi-Decisions. The attendance was approximately 
150. 

A joint session with the American Mathematical Society was held early 
Thursday afternoon at which Professor S. S. Wilks of Princeton University gave 
a lecture on Sampling Theory of Order Statistics. Professor Harold Hotelling of 
the University of North Carolina was the presiding officer. The attendance was 
approximately 300. 

This session was followed by another joint session with the American Mathe- 
matical Society which was devoted to contributed papers. Professor John W. 
Tukey presided at this session and the attendance was approximately 115. The 
following seven papers were presented: 


1. Some Significance Tests for the Mean Using the Sample Range and Midrange. 
Mr. John Walsh, Princeton University. 
2. Testing Compound Symmetry in a Normal Multivariate Distribution. 
Dr. David F. Votaw, Jr., Princeton University. 
3. Effects of Non-Normality at High Significance Levels. Professor Harold Hotelling, 
University of North Carolina. 
4. On the Problem of Similar Regions. 
Dr. Erich L. Lehmann, University of California, Berkeley and Professor Henry 
Scheffe, University of California at Los Angeles. 
5. The Fourth Degree Exponential Function. 
Dr. Leo A. Aroian and Professor Marguerite Darkow, Hunter College. 
6. On the Maximum Partial Sums of Sequences of Independent Distributions. 
Dr. K. L. Chung, Princeton University. 
7. Some Results on the Distribution of Quadratic Forms from Gaussian Stochastic 
Processes. 
Mr. Herman Rubin, Cowles Commission. 
The following four papers were presented by title: 
8. A General Weak Limit Theorem for Independent Distributions. 
Professor P. L. Hsu, University of North Carolina. 
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9. Some Significance Tests for the Median which are Valid under Very General Condi- 
tions (Preliminary Report). 
Mr. John E. Walsh, Princeton University. 

10. Loss of Information in t-tests with Unbalanced Samples (Preliminary Report), 
Mr. John E. Walsh, Princeton University. 

11. Some Theorems on the Bernoullian Multiplicative Process. 
Mr. T. E. Harris, Princeton University. 


Abstracts of all these papers appear elsewhere in this issue of the Annals, 
A beer party in honor of the foreign statisticians attending the meeting was 
held in the dining room of Saybrook College on Wednesday evening. A joint 
dinner with the American Mathematical Society and the Mathematical Associ- 
ation of America was held on Thursday evening. 
C. C. Crate, 
Acting Secretary. 


BIOMETRIKA 


A Journal for the Statistical Study of Biological Problems 
Vol. XXIV, Parts I, and II, January, 1947 


CONTENTS 


I. The variance of the overlap of geometrical figures with reference to a bombing problem. By F.Garwoop. 
II. A study of a first dynasty series of Egyptian skulls from Sakkara and of an eleventh dynasty series from 
Thebes. By A. Batrawi and G.M.Morant. III. The generalization of ‘Student’s’ problem when several] 
different population variancesareinvolved. By B.L.Wetcu. IV. The distribution of Kendall’s y coefficient 
of rank correlation in rankings containing ties. By G.P.Smuittro. V. The use of range in place of standard 
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